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(57) Abstract: A method of filtering data to predict an observation about an item for a particular case is provided in which: a 
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^ values of the item profiles; and the profiles found are used together with the function to predict an observation for a particular case 
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Collaborative Filtering 

The present invention relates to a method of filtering 
5 data in which a dataset of observations about a set of 

different items for a set of different cases is analysed 
to determine various characteristics of the dataset. 
Thus for example, the observations could reflect the 
suitability of the different items for a plurality of 
10 users (each user representing a different case) and the 
characteristics determined when the data is analysed 
could be used to predict the suitability of one or more 
items for a user. 

15 The method of the invention has particular application 
in e-commerce such as for example, Internet web-sites 
for selling products such as books, music and holidays, 
but also in call centres and telesales and in 
traditional (BAM) retailing. 

20 

Various collaborative filtering systems which use a 
database containing data representing user preferences 
to predict a topic or product which a user might like 
are known in the art. Typically, a user logs onto a 
25 website such as for example, the Amazon.com website 

which deals chiefly in book sales. The user is given a 
user ID when first using the site so that any data 
obtained from previous site visits will be retrieved and 
used when the user logs on in the future. 

30 

One known filtering method, memory based reasoning 
(MBR) , correlates the preferences of users in the data 
set for various items with preferences provided by the 
user for some of the items in the data set. The system 
35 then recommends to the user other items that similar 

users in the data set liked. However, this method can 
be slow if all other users in the data set are used to 
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make a recommendation, involves losing information if 
only a subset is used, and is subject to known sources 
of inaccuracy such as how to weight the preferences of 
each of a set of very similar users since the 
5 informational content of each is low. Consequently, the 
method is disadvantageous (and may not be practical) in 
situations where there is a large data set, i.e. a large 
number of users recommending a large number of items . 
The method is also disadvantageous in that an operator 
10 cannot see how the recommendations made correspond to 
the dataset. This is a particular problem in certain 
marketing situations where transparency of the 
recommendations made is required. 

15 One solution which has been proposed to this problem is 
the use of clustering techniques. Thus, users having 
similar preferences are grouped into clusters and the 
probability of a user belonging to any one cluster is 
calculated so that a weighting can be assigned to each 

2 0 item to be recommended to the user. However, when 

■clustering users into groups, it is assumed that all 
users in a cluster or group have the same rating for all 
items. Further, the rating of an item for a user will 
be based only on the history of users in one cluster 

25 such that a large amount of available data will be 
disregarded. Moreover, the number of clusters is 
intrinsically limited by the requirement that each 
cluster must contain a sufficiency of members to allow 
statistically meaningful results. Thus, clustering 

30 techniques are thought to be inaccurate or imprecise. 

One clustering approach to collaborative filtering is 
the Bayesian clustering approach. This is based on a 
predictive model. The model supposes that a user can 
35 be described by a single variable that assigns the user 
to one of a finite set of classes. 
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The predictive model is a set of likelihood functions, 
one for each item, that specify the probability of the 
item being suitable for a user, depending on their 



class . 



5 



An example for one of the likelihood functions might be: 
Probability the user has seen the movie 'Titanic' is 



This method is described in greater detail in Breese, 
Heckerman and Kadie "Empirical Analysis of Predictive 
15 Algorithms for Collaborative Filtering", Proceedings of 
the fourteenth conference on uncertainty in artificial 
intelligence, Maddison, WI, 1398. 

The method has advantages over MBR. In particular it is 
20 fast, since recommendations are based on a model, and in 
principle the model can be investigated to assess 
whether its behaviour accords with an administrator's 
preferences. On the other hand the method is not as 
accurate, since users are assumed to belong to one of a 
25 limited number of classes, and all predictions are the 
same across members of the same class. The number of 
classes cannot grow too large because there needs to be 
enough members in each class to generate statistically 
meaningful estimates. Moreover investigating the model 
3 0 simply leads to a list of probabilities for the items, 
one list for each class. This does not generate 
intuitive understanding about its behaviour, so that the 
ability of administrators to assess and control it is 
limited. 



10 




It is an object of the present invention to provide a 
filtering method which is capable of overcoming the 
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problems associated with the prior art. 



From a first aspect, the present invention provides a 
method of filtering data to predict an observation about 
an item for a particular case, in which: a set of data 
representing actual observations about a plurality of 
items for a plurality of different cases is modelled as 
a function of a plurality of case and item profiles, 
each profile being a set of parameters comprising at 
least one hidden metrical variable, the parameters 
defining characteristics of the respective case or item; 
a best fit of the function to the data is approximated 
in order to find the values of the item profiles; and 
the profiles found are used together with the function 
to predict an observation for a particular case about 
one or more items for which data is not available for 
that case. 



It will be understood that using the method described 
above, all of the data obtained may be used in 
predicting the observation about the item(s) . Thus, no 
data need be ignored or wasted. 

The method of the invention differs from the prior art 
naive Bayes approach described above in that in the 
method of the invention the case profiles are not labels 
which identify the class to which the case belongs. 
Instead they include metrical variables - numbers that 
enter into the predictive models as meaningful 
parameters. The use of the method of the invention 
provides a filtering method which is fast, accurate and 
generates relevant marketing knowledge about the data. 
In addition, it is easy for- a user such as for example a 
marketing executive to understand the pattern of 
predictions which can be obtained using the method of 
the invention. Further, the pattern of predictions may 
be easily controlled as will be discussed further below 
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From a further aspect, the present invention provides a 
method of filtering data to predict an observation about 
an item for a particular case in which: a set of data 
representing actual observations about a plurality of 
5 items for a plurality of different cases is modelled as 
a function of a plurality of case and item profiles; a 
best fit of the function and the profiles found are used 
together with the function to predict an observation for 
a particular case about one or more items for which data 
10 is not available for that case. 

Preferably, the function which models the data set is 
made up of a plurality of models, each model 
representing the observations about one item for the 

15 cases in the data set . Each model is preferably derived 
by identifying a model type which most closely fits the 
data available for the item in question. For example, 
the model might be based on a logistic curve or on a 
neural network. The exact model which best fits the 

20 available data is identified by a set of the unknown 

parameters which is referred to as the item profile and 
preferably comprises a vector of metrical components. 
The model further includes another set of unknown 
parameters known as the case profile. This is a vector 

25 including metrical components identifying various 

unknown characteristics of the case which for example 
could be a user in which case the characteristics would 
be assumed to cause them to like or dislike various 
items . 

30 . 

In the function which models the data set, the 
observations about items for cases are preferably 
independent, conditional on the case profiles. This 
allows the function to be used in a tractable, sensible 
35 way. 

Preferably, the models which make up the function are 
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learnt from past observations, i.e. the models are 
chosen to give a good fit between modelled observation 
predictions and actual instances of past observations. 

5 The models used may be stochastic with specified 

distribution on the error terms so that a likelihood for 
past observations given the model can be specified and 
the item profiles can then be estimated using the 
techniques that fall under the heading of maximum 
10 likelihood estimation in statistics to maximise the 
likelihood of past observations. Alternatively for 
example, models could be fitted to the data by using 
estimation procedures that seek to minimise some 
function of the errors, such as least squares and its 
variants. Alternatively a stochastic model could be 
estimated using Bayesian methods. 



15 



In an alternative however, a set of models may be built 
by an expert to behave in ways which they think 

2 0 appropriate. 

In one preferred form of the method of the invention, 
point estimates of the parameters of the case and item 
profiles are found for the dataset and these are used to 
25 predict an observation. The method of decomposing the 
dataset into a plurality of case and item profiles in 
this way is considered to be novel and inventive in its 
own right and so, from a second aspect, the invention 
provides a method of filtering data to predict an 

3 0 observation about an item for a particular case, in 

which a set of data is obtained representing actual 
observations for a plurality of cases, including the 
particular case, of a plurality of items, a function 
which models the data set is solved so that the data is 
35 decomposed into a plurality of case profiles and item 
profiles, and an observation for the particular case 
about an item is predicted using the case profiles and 
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item profiles obtained. 

Thus again using the method of the invention described 
above, all of the data obtained may be used in 
5 predicting an observation about an object for a 

particular case. Thus, no data need be ignored or 
wasted and, as data relating specifically to the case in 
question is used to obtain the case profiles, the 
predictions obtained with the method will generally be 
10 more accurate than those obtained with clustering 

methods particularly in situations where there is only a 
relatively small amount of data available. 

Preferably, the function is maximised so as to determine 
15 the case and item profiles. 

Still more preferably, the data set is modelled as a 
function of the likelihood of the data in the data set 
being present and the function is solved by choosing 
20 item profiles and case profiles which maximise the 

likelihood of the data in the data set being present. 

Still more preferably, the function is maximised 
iteratively such that one of the case and item profiles 
25 is held constant during each iteration. 

One advantage of this method is that all the information 
in the data is used and yet the number of parameters 
that are used to make recommendations scales linearly 
30 with the number of items (objects) . In a Bayesian 

network or decision tree approach as used in many prior 
art methods, by contrast, either information is 
discarded or the number of parameters potentially scales 
as the square of the number of items (objects) . 

35 

In an alternative preferred filtering method according 
to the invention, point estimates of the case and item 
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profiles are not derived but rather a prior distribution 
is assumed over possible case profiles and point 
estimates of the item profiles are then obtained. This 
method is believed to be novel and inventive in its own 
5 right . 

From a further aspect therefore, the invention provides 
a method of filtering data to predict an observation 
about an item for a particular case, in which a set of 
data is obtained representing actual observations for a 
plurality of cases about a plurality of items, a 
function which models the data set as a function of a 
plurality of- item profiles and a prior distribution over 
a plurality of possible case profiles is set up to 
provide point estimates of the item profiles that fit 
the function to the data, and an observation about an 
item for a particular case is predicted using the item 
profile point estimates obtained together with a set of 
data representing observations about a plurality of 
items for the said particular case. 

In this method, as the data is modelled in such a way 
that only point estimates of the item profiles are found 
(i.e. point estimates of the case profiles are not 
25 obtained) the dimensionality of the process of solving 
the function is much lower than it would be if no prior 
distribution over case profiles were assumed. Thus, 
this feature reduces the sampling variance of the 
estimated item profiles, improving the prediction 
30 performance. Consequently, the method allows a good, 

relatively accurate solution to the data set to be found 
by relatively simple computation. 



10 



15 



20 



An observation about an item for a particular case can 
35 be predicted using various alternative methods. In two 
particularly preferred forms of the invention, the 
observation can be predicted either by using the item 
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profile point estimates together with the function which 
models the data set to obtain a prediction of the 
observation directly or by updating a prior distribution 
over possible case profiles using Bayesian inference, 
5 the data relating to the particular case, and the 
function. 

Most preferably, the prediction of an observation about 
an item for a case is estimated by Bayesian inference 
10 about the case profile. Thus, the observation can be 

predicted by updating a prior distribution over possible 
case profiles using Bayesian inference, the data 
relating to the particular case and the function. 

15 It will be understood that this recommendation method 

could be implemented by a single function such that the 
prior distribution is not explicitly updated but is only 
done so implicity. As the item profiles are estimated 
based on an assumed prior distribution of the case 

2 0 profiles, the method of obtaining the item profiles is 
more closely linked to the prediction method using 
Bayesian inference which also uses an assumed prior 
distribution of the case profiles than it would be if 
point estimates of both the item and case profiles were 

25 obtained. This also leads to potentially more 

satisfactory results being obtained from the prediction 
method of the invention. Further, this method is 
equally applicable to the case in which point estimates 
of item profiles and case profiles are obtained. 

30 

From a further aspect therefore, the invention provides 
a method of filtering data to predict an observation 
about an item for a particular case, in which a set of 
data representing actual observations for a plurality of 
35 cases about a plurality of items is modelled by a 

function, and the function is solved so as to decompose 
the data into a plurality of case profiles and a 
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plurality of item profiles, and an observation for the 
particular case about an item is predicted by Bayesian 
inference using the case profiles and item profiles 
obtained together with a set of data representing 
observations about a plurality of items for the said 
particular case. 



Preferably the case profiles obtained are used to obt 
a prior probability distribution over possible case 
profiles for the said particular case and the prior 
probability distribution is then used in the Bayesian 
inference . 



Preferably the prior probability distribution is 
generated by taking an average of the case profiles i 
the data set. 



Preferably a posterior probability distribution over 
possible case profiles for the said particular case is 
generated from the prior probability distribution by 
Bayesian inference using the set of data relating to the 
said case and a function modelling the likelihood of the 
data set being present. 

Preferably the posterior probability distribution is 
used to generate a probability distribution over 
possible observations about items for the particular 
case. 



Preferably, only the data relating to those items for 
which observations have been obtained for the case is 
used in updating the prior distribution over possible 
case profiles. This improves the results obtained as it 
avoids the bias effect from assuming for example that 
for a particular case, there is a reason why no 
observation has been recorded for an item. 
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Preferably, each case is a different user of a 
prediction system such that observati ons by that user 
about various items are included in the dataset . 

5 Preferably the function is made up of a plurality of 
models, each model representing the suitability of an 
item for a user. Still more preferably, each model of 
the suitability of an item for a user depends directly 
only on the user (or case) profile and the profile for 
10 that item, and not directly on any of the data relating 
to the suitability for the user of any other item. 

Preferably the item profiles are estimated as those 
parameters which maximise the fit between the function 
15 which models the data set and the data. 

Preferably the number of components of each item profile 
is set by the profile engine to maximise the 
effectiveness of the function in making predictions, 
20 Still more preferably, this is done using standard model 
selection techniques such as the Akaike information 
criterion. 

Still more preferably, the data set is modelled as a 
25 function of the expected likelihood of the data in the 
data set being present and the item profiles are chosen 
as the parameter values which maximise the likelihood of 
the data in the data set being present given the 
function and the assumed prior distribution of the case 
30 profiles. 

Still more preferably, the function is maximised 
iteratively and in the preferred embodiment, an EM 
algorithm is used to do this. 

35 

Preferably the prior distribution over each component of 
the plurality of possible case profiles is assumed to be 
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a standard normal distribution and the components are 
assumed to be independent. Still more preferably, this 
distribution is also used in the Bayesian inference to 
estimate the observation about an item for the 
5 particular case. 

Preferably a posterior probability distribution over 
possible case profiles for the said particular case is 
generated from the prior probability distribution by 
0 Bayesian inference using the set of data relating to the 
said particular case and a function modelling the 
likelihood of the data set being present. 

Preferably the posterior probability distribution is 
5 used to generate a probability distribution over 

possible observations about items for the particular 
case . 



In one embodiment the data set includes ratings given by 
users for various items and the posterior probability 
distribution is used to generate a probability 
distribution over possible ratings for items by the 
user. 



Preferably the probability distribution over possible 
preferences or ratings for items by the user is used to 
estimate the preference or rating of the user for each 
of a set of items. 

From a still further aspect, the present invention 
provides a method of filtering data to predict an 
observation about an item for a particular case, in 
which a set of data is obtained representing actual 
observations for a plurality of cases about a plurality 
of items, a function which models the data set as a 
function of a set of case profiles and a set of items 
profiles comprising sets of parameters is set up, 
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wherein the case and item profiles each comprise at 
least one hidden metrical variable, the parameters 
defining the characteristics of each said respective 
case and item, the method comprising the steps of: 

a) estimating the values of the case profile 

parameters by solving a hidden variable model of 
the dataset ; 



10 b) using the estimated values of the case profile 

metrical variables in the function to estimate the 
values of 'the item profile metrical variables; and 

c) predicting an observation about an item for a 
15 particular case using the item profile values 

obtained together with a set of data representing 
observations about a plurality of items for the 
said particular case. 



20 This method is relatively fast and simple to implement 
as it can be implemented using widely available and 
familiar algorithms. The method has the advantage that 
once the case profiles have been estimated such that 
they can be treated as known variables, a wide range of 

25 familiar curve fitting and statistical techniques can be 
used to estimate the item profiles. This allows a 
modeller to use widely available statistical packages to 
estimate item profiles for a variety of possible item 
functions . 

30 

Further, by estimating values of the case profiles and 
using those estimated values to estimate the item 
profile values, the dimensionality of the dataset of 
observations about cases is reduced before estimating 
35 the item profiles. Thus, the dataset containing 

observations about a possibly large number of items for 
each case is reduced to a dataset containing a small 
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number of profile components for each case. 

Preferably, the case profile values are estimated by 
solving a hidden variable model of the dataset to find 
5 approximate values of the item profile variables and the 
approximate item profile values are then used to 
estimate the case profile values. 

Still more preferably, the hidden variable model used is 
10 a linear model such as for example a standard linear 
factor model or principal component analysis. 

Once the case profile values have been estimated, they 
are preferably substituted into the function modelling 
15 the dataset which is then solved using maximum 

likelihood techniques to find the item profile values. 

In one preferred embodiment of the invention, items in 
the dataset can be considered as belonging to a 

2 0 plurality of different groups, each group having a 

different set of case profiles associated with it so 
that the case profile values for each group are 
estimated separately. This could be advantageous in 
situations where the different groups largely act as 

25 indicators of different components of the cases 1 

profiles as it reduces the number of free parameters 
that need to be estimated for a given number of overall 
components in a case profile and so could result in more 
accurate predictions being made. 

30 

Alternatively or in addition, some items in the dataset 
could be treated directly as observed components of the 
case profile, i.e. as values of one or more of the 
metrical variables. This could be advantageous in 
35 situations where one or more items caused other aspects 
of the observations rather than themselves being caused 
by other things. 
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Once the case and item profile values have been 
estimated, they can be used to estimate an observation 
about an item for a case. Preferably, the prediction of 
an observation about an item for the case is made by 
5 updating a prior distribution over possible profiles for 
the case by Bayesian inference and then using the 
updated case profile obtained together with the function 
modelling the dataset and the estimated item profile 
values to make predictions. It will be understood that 
10 this prediction method could be implemented by a single 
function such that the prior distribution is not 
explicitly updated but is only done so implicitly. 

This method has the advantage that any point estimate of 
15 a case profile based on the updated case profile 

obtained will not be very sensitive to small changes in 
the dataset. This reduces the potential for imprecision 
in the estimates of the case profile to act as a source 
of prediction error. 

20 

In an alternative embodiment, an observation about an 
item for the case is estimated by maximising the 
likelihood of the data relating to the case in question 
given the function modelling the dataset and the 
25 estimated item profile values to find the values of the 
case profile, and then using the case profile obtained 
together with a likelihood function and the estimated 
item profiles to predict observations about items for 
that case. 

30 

The entire filtering process could be carried out in 
real time each time that a prediction was requested. 
However, it will be appreciated that this would require 
a very heavy calculation load to be carried such that a 
35 prediction would take a relatively long time to 

generate. Preferably, therefore, the item profiles and 
the prior distribution over possible case profiles or 
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the actual case profiles are calculated in an off-line 
non real-time filtering engine and are supplied to an 
on-line real-time engine for use in the calculation of 
predicted observations for a case when a set of data 
relating to the said case is supplied to the real-time 
engine. In this way, updated predictions may be 
supplied in real-time without the need to recalculate 
item and/or case profiles for each case and item in the 
data set. 

The various filtering methods of the invention as 
described above can be used in various marketing 
contexts including analytics, marketing automation and 
personalisation. 

The data representing the suitability of a plurality of 
objects for a plurality of users could be obtained in 
many different ways. For example, users could merely 
select some objects from a group of objects and an 
assumption could be made that the selected objects were 
• suitable for the user. Alternatively, the level of 
suitability of an object could be linked to the rating 
given to that object by- a user. 

25 Preferably, the data set is modelled as a function of a 
plurality of unknown case and item profiles. It will of 
course be understood however that the item and case 
profiles may include information on observable 
characteristics such as the age of a user so that one or 
more of the case and/or item profiles in the model may 
be known. 



20 



30 



In one embodiment of the invention, the item profiles 
obtained by the method of the invention could be stored 
such that subsequently a particular item could be 
specified and items which were similar to that 
particular item would then be recommended. The 
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specified item could be compared to other items for 
which item profiles were available using for example a 
similarity metric based on the item profiles. A 
recommendation of other items which were similar to the 
5 specified item could then be made to the user. 

The method of recommending similar items to a user as 
described above is thought to be novel and inventive in 
its own right and so, from a further aspect, the present 

10 invention provides a method of filtering data to find 

items which are similar to an item specified by a user, 
in which a set of data representing observations about a 
plurality of items for a plurality of cases is obtained, 
a function which models the data set is used to estimate 

15 a plurality of item profiles each containing a set of 

parameters representing characteristics of the item and 
at least one hidden metrical variable, and wherein items 
which are similar to a specified item are found by 
comparing the item profile of the specified item to 

20 other item profiles. 

In a further alternative embodiment, the item and case 
profiles obtained from the filtering methods of the 
invention may be used to sort items and/or cases into 
25 groups or clusters by comparing the case and/or item 
profiles and placing all those cases or items having 
similar profiles into one group or cluster. Such groups 
or clusters might provide useful information to 
marketing organisations for example. 

30 

This method is also considered to be novel and inventive 
in its own right and so, from a further aspect, the 
present invention provides a method of filtering data, 
in which a set of data representing observations about a 
35 plurality of items for a plurality of cases is obtained, 
a function which models the data set is solved so that 
the data is used to estimate a plurality of item 
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profiles each containing a set of parameters 
representing characteristics of the item, and at least 
one hidden metrical variable, and wherein cases and/or 
items are sorted into groups or clusters such that each 
5 group contains cases or items having similar case or 
item profiles. 

In some instances, the data obtained may be biased. 

This may be due to the fact that users have only sampled 

10 some of the objects about which they are asked and/or 

that users have not entered data for all of the objects 
which they have sampled. In order to avoid the 
prediction provided by the method of the invention being 
influenced by this selection bias, the method preferably 

15 further includes the use of statistical techniques to 

correct for bias in the case data prior to predicting an 
observation about an item for a case. 

In some instances, the data available may not be 
20 sufficient, for accurate predictions to be made. In this 
case, a user could be asked to assess some further items 
(referred to herein as exogenous standards) which are 
not directly linked to the class of items for which 
predictions of observations are being made. 

25 

Preferably therefore, the method of the invention 
further comprises the step of obtaining data relating to 
the assessment by a plurality of users of one or more 
exogenous standards so as to increase the amount and 
3 0 range of data available. 

In this way, means are provided for comparing the 
preferences of each of the users contributing to the 
data set. This may improve the overlap between the data 
35 sets obtained for each user. 

Examples of exogenous standards which might be used are 
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a photograph of scenery for holiday preference selection 
or descriptions of TV programmes for book preference 
selection. A user's assessment of the exogenous 
standard would take place either on the basis of the 
5 information presented alone (e.g. a photograph of 

scenery or a text summary of an unread book or magazine) 
or on the basis of perceptions associated with the 
description (e.g. users' perceptions of, say, "Friends" 
TV programme or a book or a magazine that they have 

10 previously read) . The use of such exogenous standards 
may improve the assessment overlap between users. This 
may help to address problems with data sparseness by 
artificially increasing the pool of experiences common 
to multiple users and therefore making the data set of 

15 items to be assessed "better populated" than would 

otherwise be the case. The satisfactory application of 
exogenous standards requires users 1 preferences 
regarding the exogenous standards to be at least 
reasonably associative with their preferences concerning 

2 0 the class of objects to be assessed. Thus, suitable 

exogenous standards would be found by testing them in 
advance on a test population using appropriate surveying 
and analysis methods . 

25 The use of exogenous standards to improve the population 
and range of a data set to be used in the prediction of 
user preferences for a particular object is thought to 
be novel and inventive in its own right. Thus, from a 
further aspect, the invention provides a method of 

3 0 obtaining a data set from which the suitability of a 

specific object for a user can be estimated, in which 
data relating to the suitability for a plurality of 
users of a plurality of related objects is obtained 
together with data relating to the preferences of those 
35 users for at least one exogenous standard which is not 
directly related to the plurality of related objects. 
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It will be appreciated that the exogenous standards used 
can be in multi -media and include any form of graphic 
image, photograph, sound or music as well as a 
conventional passage of text, a name or other written 
description. 

One of the most profitable applications of 
personalization technologies such as collaborative 
filtering is to match advertising with users on a one to 
one basis so that each user sees those advertisements 
that are most -likely to elicit a positive response from 
her. This application can either be run on a stand- 
alone basis (e.g. by using passive observation of each 
user's browsing behaviour and a record of click through 
rates and other indicators on the part of previous users 
in respect of particular advertisements to build up the 
necessary user and item databases to allow collaborative 
filtering) or on the back of an express personalised 
recommender service, i.e. a service for predicting the 
suitability of an item for a user in which data 
representing the suitability of a plurality of items for 
a plurality of users is obtained and analysed using for 
example a filtering method according to the invention. 
In the latter case difficulties may arise where 
preferences concerning the object being advertised are 
not strongly associative with the class of objects about 
which data is held by the personalised recommender 
service. In such cases the introduction of 
appropriately selected exogenous standards may "bridge 
the gap" allowing better prediction of preferences 
concerning advertised goods (as well as helping with 
data thinness as described above) . The appropriate 
exogenous standards must be selected through preparatory 
research to be at least reasonably associative with both 
3 5 the objects for which data is obtained and the 
advertisements being placed. 
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In the data filtering method of the invention, the data 
relating to the suitability of the items for the users 
can be obtained by asking each user to rate their 
opinion of each or some of the items (for example on a 
5 scale of 1 to 5) . However, users may well have other 
information about the items or information on related 
items and this information could usefully be collated. 

Preferably therefore, users are given the opportunity of 

10 giving additional details about their preferences over 
and above rating the items about which they are asked. 
Thus, the users can provide more information about their 
preferences than is currently usable in the prediction 
of the suitability of an item for a user or can be 

15 displayed as output in the system at the time at which 

they input the data. Thus, for example, a user might be 
asked whether or not she had been to each of four 
locations and she would answer yes or no for each of 
these. If the user wished to do so however, she could 

20 add additional information either in the form of, say, 
other locations which she had visited (resulting in a 
horizontal broadening of the data set) or she could, for 
example, specify the attractions which she had visited 
at each of the four locations (resulting in a vertical 

25 deepening of the data set) . Thus, in vertical deepening 
of the data set, the user will provide data relating to 
one or more attributes (e.g. the attractions at a 
particular location) of one or more of the items for 
which data is obtained. 

30 

This broadening or deepening of the data set could 
either be done by adding to closed menu options 
presented to users at the data acquisition stage or by 
inviting free text inputs from the user. An advantage 
3 5 of the latter route is that it provides a means to 

determine what sorts of additional information would be 
most commonly encountered and hence useful to predict. 
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This determination could be automated so that the 
database could be broadened or deepened efficiently 
without overburdening users with an excessive number of 
options . 

Once a sufficient number of users had provided 
additional information about an item or an attribute of 
an item which was not originally included in the data 
set, the data relating to that item or attribute would 
be added to the data set and used in the prediction of 
the suitability of items for subsequent users. 

The idea of allowing users to provide information of 
greater detail than is at the time directly capable of 
15 application in the calculation of suitability 

predictions so that this additional data is used to 
expand the data set is believed to be novel and 
inventive in its own right. 

Thus, from a further aspect, the invention provides a 
method of obtaining a data set from which an observation 
for a case about a specific object can be predicted, in 
which data relating to the observations for a plurality 
of cases about a plurality of predefined items is 
obtained and in which further data relating to one or 
more attributes of one or more of the predefined objects 
may also be provided for one or more of the cases. 

Preferably, a statistical model is used to determine 
30 when an item or item attribute has been specified by a 
sufficient number of users to allow it to be added into 
the observation prediction data set. 

Whilst collaborative filtering (and the filtering method 
35 of the invention in particular) excel at subjective 

recommendation other methods will often be preferable 
for recommendation in respect of objective criteria. As 
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many real life applications require recommendations / 
advice based upon a mix of subjective and objective 
criteria the combination of multiple techniques may give 
better results in such situations. 

5 

Consequently, a pre-f iltering processing step may be 
provided to carry out preliminary screening using 
objective criteria to reduce the number of items that 
must be assessed in the filtering step. 

10 

As, typically, it is computationally easier to screen an 
item using an objective process than a filtering one, 
generally pre- screening will make the overall prediction 
process more efficient in the use of computer resources. 

15 In practice, it may sometimes be most efficient to run 
the pre-f iltering processing stage and filtering 
together such that each individual item is pre -screened 
and then (if necessary) subjected to filtering. 
Weighting and other adjustments can then be applied 

20 before the process moves on to the next step. 

Still more preferably, weighting factors may be applied 
to the data relating to the observations about items for 
the cases prior to the filtering step. 

25 

In one preferred embodiment, the weighting factors 
applied to the data reflect the time that has elapsed 
since the time at which the observation about the item 
was formed such that the weight of each piece of data 
30 for predictive purposes declines with time. In this 

way, the profiles obtained using the filtering method of 
the invention may be made to automatically reflect the 
changes in an item which occur over time. 

35 Such a use of weighting factors is considered to be 
novel and inventive in its own right and so, from a 
further aspect, the present invention provides a method 
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of weighting data relating to observations about an item 
in which the weight of the data decreases with an 
increase in the time elapsed since the observation was 
made . 

5 

Particularly where observations are weighted according 
to recency, it may be useful to record the value of each 
item profile on a periodic basis (e.g. daily, weekly, 
monthly etc.) in order to track any changes in profile 

10 values over time. These changes can then conveniently 

be displayed using a graphical interface such as an item 
position map of the type described below. In such a map 
the changes in position can be marked as trajectories 
across profile space and the time each profile was 

15 calculated can be represented either by suitable 

labelling or by colour coding or some other suitable 
means . 

Changes in customer (or personal) profiles can likewise 
20 be tracked over time by periodically calculating and 

recording profile values in respect of relevant sets of 
items. These can then be displayed graphically either 
individually (in the same way as for item profiles) or 
net changes in the aggregate density of profiles across 
25 can be displayed by some suitable means such as colour 
coding or 3D simulation according to time. To aid 
understanding these changes may be animated. 

Preferably, a post filtering processing step is provided 
30 in addition to or instead of the pre -filtering 
processing step. 

Post filtering processing will typically have primarily 
commercial value, allowing a provider of the filtering 
3 5 method of the invention to adjust the output before it 
is used or displayed to an end-user (i.e. the user 
viewing the results of the filtering method) . This 
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addresses commercial concerns sometimes expressed 
concerning filtering to the effect that the process 
deprives the provider of a degree of marketing / sales 
discretion. 

5 

In one preferred embodiment, the post -filtering 
processing step is a rules based processing step which 
excludes any items which do not fall within a defined 
set of criteria from the predictions output from the 
10 filtering step. 

One problem that arises in filtering systems such as 
that of the invention is that there is not enough data 
available to provide accurate predictions until a 
15 minimum number of users have provided their preferences 
for a range of objects or until a minimum amount of 
information has been gathered for a case. However users 
are unlikely to be motivated to provide this information 
unless they will obtain a prediction after doing so. 

20 

Thus, in a preferred embodiment of the invention, a 
different type of output giving an estimated prediction 
such as for example the generic mean of the output can 
be substituted for filtering predictions where, for 

2 5 whatever reason, there is insufficient information 

concerning either one or more items within the item 
database or concerning one or more cases. 

In this way, users will see that an output is provided 

3 0 and so will be encouraged to provide their details and 

preferences so that the database can be built up until 
it contains sufficient information to implement the 
filtering process of the invention. 

3 5 Preferably, the estimated predictions are replaced 

gradually by predictions obtained from the filtering 
method of the invention as more data becomes available. 
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This can be achieved using various means including 
Bayesian updating or, more simply, a weighted average of 
the estimated and filtered predictions with the 
weighting set according to the statistical uncertainty 
of the filtering prediction (where the statistical 
uncertainty is dependent on the amount of data 
available) . 

In an alternative preferred embodiment, the manager of 
the database could generate a fixed number of phantom 
cases. The profile of an item for which insufficient 
data was available would be specified by the manager to 
be a weighted average of some other items and the 
phantom cases would be specified to rate that item with 
ratings which depending on the manually determined 
profile. Whenever a new actual case was added to the 
database, a phantom case could be removed. Thus, over 
time, the updated case profile would increasingly 
reflect the observations for actual cases. 

The output from the filtering method of the invention 
could be used in a number of ways. Thus, the end-user 
of the filtering method may be notified of some or all 
of the results (possibly via a third party such as the 
provider site operator or a call centre staff member) or 
alternatively some or all of the output may be made 
available solely to one or more third parties (such as a 
provider) and not to the end-user. This might be useful 
for commercial purposes such as for example content 
management or advertising personalisation. 

Thus, in one preferred embodiment the invention provides 
a data filtering service in which a database of 
observations about a plurality of items for a plurality 
of cases is obtained and analysed on an exclusive basis 
for a single client. The database could be used as a 
recommender service and/or for the client's content 
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management and/or for advertising selection. 



Typically, this client would be a website service 
provider selling a specific range of products. 
5 Advantages of this arrangement include ease of 

implementation, ability for the client to dictate the 
parameters of the service fully allowing to total 
customisation, exclusivity regarding the data collected 
(possibly shared with the PCF service provider) , and 
10 exclusivity regarding the service provided (which may 
have the commercial benefit of acting as a marketing 
tool to attract new users and/or as a means for 
increasing customer loyalty) . 

15 There are, however, significant disadvantages of this 

arrangement. In particular, the amount of data that can 
be collected is likely to be much less than for a pooled 
service (unless the client is strongly pre-eminent in 
its field) . This will have an adverse effect on the 

20 range, depth and precision of the predictions that may 
be generated. Additionally, the service may prove less 
convenient for users as it is well-known that Internet 
users are deterred by an overabundance of registrations, 
passwords, information requests and so forth. The 

25 adoption of a pooled service with common registration 
(in whatever form) and data acquisition is therefore 
more attractive to Internet users who recognise that 
they will receive a greater range of services (i.e. from 
multiple sites) for their registration and data 

3 0 inputting and are therefore even more likely to regard 
the registration and data provision processes as 
worthwhile. Thus, unless the client website operator is 
pre-eminent in its field or intends to rely entirely on 
passively collected data, the user uptake of the service 

35 may be reduced vis a vis a comparable pooled service. 

Consequently, in an alternative preferred arrangement 
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the invention provides a data filtering service in which 
a database of observations about a plurality of items 
for a plurality of cases is obtained and analysed to 
provide a database which may be pooled with other 
5 databases, the filtering service operating from the 
pooled databases via linkage preferably through a 
dedicated extranet. Under this arrangement a single 
history database (i.e. a data set representing the 
suitability of a plurality of objects for a plurality of 
10 users) may be established, developed and maintained for 
the class of clients being served as a whole. 

The most significant advantage of this pooled 
arrangement is that it allows significantly more widely 

15 ranging, detailed and precise predictions for each 
client than might ordinarily otherwise be the case. 
Further advantages include improved user convenience 
(due to the reduction in individual registrations and 
data inputs required for access to the service via 

20 multiple websites - as discussed above) and potentially 
reduced development and maintenance costs for each 
client due to scaling economies and costs sharing. 

In one preferred arrangement, the pooled database is 
25 configured such that, although the history database is 
held in common as described above, contributing 
websites retain either partial or complete exclusivity 
in relation to the inputs and outputs from the database 
in respect of those particular users that register 
30 through their sites. 

Thus, for example, other websites might be able to make 
use of information concerning such individual users for 
the purposes of obtaining predictions regarding 
35 optimisation of site advertising or content for that 
individual but would not be able to make use of the 
information for the purpose of offering express advice 
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or recommendations to the individual user. 

An advantage of this arrangement for the website 
acquiring the information concerning the individual user 
5 is that it can retain a degree of exclusivity in respect 
of prediction/recommendation services to that user 
whilst taking advantage of the data concerning 
assessment of objects to provide wider, deeper and more 
precise advice and recommendations to the user than 
10 might otherwise be the case. 

In a further preferred arrangement, database information 
concerning individual users is held in a common pooled 
database but either partial or complete exclusivity may 
15 be maintained by individual clients in relation to 

inputs and outputs in relation to specific classes of 
item. 

Such an arrangement might for example suit groups of 

20 non-competing clients looking to co-market and / or 
increase user convenience / minimise development / 
maintenance costs. Dependant on the degree of inter- 
relationship between the specific classes of objects to 
be assessed such an arrangement may also allow more 

25 precise predictions to be made, based upon additional 
information concerning individual users or items 
acquired by other participating websites. Thus, for 
example, separate clients operating travel agency, 
restaurant guide and wine selling sites might take 

3 0 advantage of pooling of user information concerning 

travel, dining and wine preferences to provide a more 
precise and -convenient service to users than would be 
possible individually whilst at the same time limiting 
user access to advice / recommendations relating to 

3 5 their sales field to themselves as a marketing / 
customer loyalty tool. Such a partial pooling 
configuration would have particular value in optimising 
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advertising content as it would potentially allow 
advertising in fields other than the client f s primary 
field of activity to be optimised with much greater 
precision. In all cases, use could be made subject to 
applicable data protection principles being observed. 

The above has been described principally in terms of a 
service by which an individual user interacts directly 
with a service in real-time (either passively or 
expressly or both) . However, the service may equally 
well be provided to users indirectly via the medium of a 
third party such as, for example, a salesperson or call 
centre operative. 

In such instances, the third party would interact 
directly with the service via any of the appropriate 
means described above and interact with the ultimate 
user by any reasonable method (typically either by 
telephone or face to face communication, but potentially 
also for example by e-mail, letter, video link or other 
means) . 

A filtering service carried out on this basis may 
provide the ultimate user with express predictions 
giving rise to advice or recommendations, or it may not 
be made known to the ultimate user but instead be used 
to provide recommendations or advice based on 
predictions to the third party (for example regarding 
up-selling or cross-selling opportunities or simply 
concerning suggestions concerning appropriate 
recommendations / advice that the third party might 
choose to make) , or it may be used for a number of 
different purposes some of which are made known to the 
ultimate user and some are not. 
35 

The service might operate in real-time or not. In other 
regards the process would operate in the same manner as 
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described above except where the practical context 
provides otherwise. (Thus, for example, it would not 
normally be possible to use images to acquire exogenous 
standards information from ultimate users by telephone - 
although it might be in a face to face context where a 
display screen was available (e.g. in a shop or travel 
agency) ) . 

Using such a service provides the ultimate user with 
many of the benefits of the on-line service and provides 
the third party with very useful customer service and 
sales tools, and / or a means of supplementing the 
skills base of its operatives as well as the other 
advantages discussed more generally above. 

It will be noted that prediction/recommendation services 
may also be provided to clients through multiple 
channels such that the service can be delivered to users 
via one of several touch points across the client - user 
interaction interface. Thus, for example, a travel 
agency might provide its customers with the same 
filtering based advice drawing upon the same databases 
via inter alia the Internet, WAP, digital interactive 
TV, its call centres and retail shops according to the 
requirements of its customer. This flexibility provides 
significant customer service benefits to both client and 
customer. 



The primary use of a filtering service according to the 
3 0 invention to provide predictions concerning the 

preferences, likely courses of action, decisions and 
responses of individuals has already been discussed. In 
addition, the information contained within the history 
databases may preferably be marketed to various third 
3 5 parties particularly as a source of market information 
whether in regard of the characteristics of the 
individual constituent users (e.g. for the compilation 
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or acquisition of mailing / prospect lists or for the 
purpose of datamining of whatever applicable form) or in 
regard of aggregate information concerning either users 
or objects assessed or both (e.g. for the purpose of 
datamining of whatever applicable form or for 
benchmarking, profiling, obtaining trend / time series 
data or any other recognised management, marketing or 
market research purpose) . 

As an adjunct to this it is considered preferable that 
an archive of history data be maintained and a means 
employed to facilitate the searching for, collation and 
analysis of data from this archive according to various 
criteria including by date. This will greatly enhance 
the usefulness of such data for the purpose of off-line 
sales most particularly in the provision of all forms of 
time dependent analysis and information. 

In one preferred embodiment of the invention, an 
indication of the level of personalisation of the 
predictions provided is given at the user interface . 
This will inform the user of how targeted the 
recommendations provided are to his or her particular 
tastes. This has the advantage that the user will be 
encouraged to input more information into the database 
as they will see a direct result in an increase in the 
level of personalisation of recommendations. It will 
also provide a useful indication to the user of when 
there is no point answering any further questions as the 
level of personalisation will stop increasing. 

The provision of an indication of the level of 
personalisation of recommendations generated by a 
collaborative filtering engine is believed to be novel 
35 and inventive in its own right and so, from a further 
aspect the present invention provides a method of 
providing an indication of the level of personalisation 
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of recommendations generated by a collaborative 
filtering engine to a user at the user interface. 



The indication of the level of personalisation could for 
example be provided by a sliding scale representing a 
personalisation score. 



In one preferred embodiment, the recommendations are 
generated by a filtering method according to the 
invention and the personalisation score is obtained by 
determining the average variance of the probability 
distribution over each characteristic for the case in 
question. 



15 Preferably, the recommendations provided to the user at 
the user interface are updated each time that the user 
enters a further piece of information into the database. 
This will further encourage the user to input 
information as they will obtain a direct result by so 

20 doing. 

Still more preferably, the user interface is a web site 
and the inputting of information is carried out on the 
same page on which the personalisation level indicator 
25 and the recommendations are displayed. 

In one preferred embodiment of the filtering method of 
the invention, each item in the data set is plotted 
against a first component of the item profile and a 
3 0 second component of the item profile on the x and y axes 
respectively. Thus, the relative characteristics of 
the items in the data set can be compared to one another 
by a user such as a marketing executive viewing the 
graphical representation thereof. 



If the user considers that the position of an item is 
incorrect, he can move that item thus imposing a 
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different profile on it. This could for example be 
useful if the user considered the item profile component 
on the x axis to represent some characteristic of users 
(for example yuppiness) to which items appealed and 
5 wished to market an item to more young people even 
though the profile calculated by the profile engine 
showed the item to be popular exclusively amongst older 
people. 

10 This method of imposing a profile on an item is 

considered to be novel and inventive in its own right 
and so from a further aspect, the present invention 
provides a method of filtering data in which a function 
is set up which models a set of data representing 

15 observations about a plurality of items for a plurality 
of cases, as a function of a plurality of item profiles 
and case profiles each containing a set of unknown 
parameters defining characteristics of the case or item, 
and a best fit of the function to the data is found in 

20 order to find the values of the unknown parameters, the 
unknown parameters for each item are compared to one 
another and, if desired, an operator alters one or more 
of the unknown parameters for one or more of the items 
before using the sets of unknown parameters to analyse 

25 the underlying trends in the data. 

Preferably, the parameters found together with the 
altered parameters are used together with the function 
to predict an observation about one or more items for a 
30 particular case for which data is not available. 

From a further aspect, the invention extends to a method 
of controlling a recommendation engine. Further, the 
method extends to a method of using information about 
35 items by restricting the item profiles. 

It will be appreciated that the filtering methods 
according to the invention would usually be implemented 
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through the appropriate computer software. Thus, from 
further aspects, the invention provides computer 
software for carrying out the methods described above. 
This extends to software in any form, whether on media 
5 such as disks or tapes or supplied from a remote 

location by e.g. the Internet. The software may be in 
compressed or encoded form, or as an installation set. 
The invention also extends to data processing apparatus 
programmed to carry out the methods. The methods may be 
10 carried out on one or more sets of apparatus, and may be 
distributed geographically. The steps of the method may 
be divided up, and the invention extends to performing 
some steps only and supplying data to another party who 
may carry out the remaining steps. 

15 

Preferred embodiments of the invention will now be 
described by way of example only, and with reference to 
the accompanying drawings in which: 

20 Figure 1 schematically shows the arrangement of a 
filtering system according to the invention; 

Figure 2 schematically shows a page of a website using a 
filtering method according to the invention. 

Figure 3 shows a set of raw data about a plurality of 
users 1 preferences as displayed to a user in software 
embodying the invention; 

3 0 Figure 4 shows a pair- wise correlation of the data of 
Figure 3 ; 

Figure 5 shows a plot of first and second item profile 
components for each item in the data set of Figure 3 as 
3 5 provided by software embodying the invention; and 

Figure 6 shows a plot of groups of users having similar 



WO 02/10954 



PCT/GB01/03383 



- 36 - 

profiles against the first and second item profile 
components as provided by software embodying the 
invention. 

The filtering method of the invention is a predictive 
technique that builds, estimates and uses a predictive 
model of the observations about items for different 
cases in terms of case profiles for each case which 
include hidden metrical variables. The predictive model 
can for example be used to predict which of a number of 
items is most likely to arise next, or to predict the 
values of a number of missing observations. The method 
is applicable to all circumstances where conventional 
collaborative filtering would find application but is 
not limited to these uses. 

The method is embodied by a computer program or software 
for carrying out the method and the program is adapted 
to provide recommendations of items to an individual 
2 0 user who accesses the information via an Internet 
website. The recommendations are provided to the 
website by a filtering engine described below. 

The filtering engine includes an off-line profile engine 
8 and a real-time recommendation engine 10 as shown in 
Figure 1. The off-line profile engine contains a 
database of data relating to the preferences of various 
users for various items stored in storage means 7. This 
data could have been obtained by asking users to rate 
each of a list of items and/or by monitoring users 1 
click histories while on-line. 

When a user logs on to a web-site using the filtering 
engine they are asked to rate various items so that the 
35 engine can store a history for the user. The filtering 
engine builds up and stores a database that records 
observations about a number of users. 



10 



25 



30 
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Recommendations made by the method of the invention ar 
based on learning about a user's profile from 
observations about her. Data about the user (and the 
data about previous users which makes up the database) 
can be gathered from a number of sources including: 

from a website 
by questionnaire or survey 
by phone 

from bank records or other sources of transaction 
history 

customer service records 



10 



Observations about users which can be included in the 
15 database can include: 

• Click-stream history for single visits to a web- 
site. If a user visited the same web-site on a 
number of occasions, the click-stream history for 

20 each history would form a separate record in the 

database. 

• Combined click-stream history for all of a user's 
visits to a web-site by the user. In this case the 
user would need to identify herself to the web-site 

25 so that details of different visits can be stored 

and matched up. 

• Ratings of objects. For example the user may be 
asked to rate various products that she has 
experienced. 

30 # Answers to questions, either just from this visit 

to the website, or combined for all visits. 

• Responses to "exogenous standards". Examples of 
these are a photograph of scenery for holiday 
preference selection or descriptions of TV 

35 programmes for book preference selection. The 

exogenous standards used can be in mult i -media and 
include any form of graphic image, photograph, 
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sound or music as well as a conventional passage of 
text, a name or other written description. 
Demographic and other information about the user. 
The user's purchase history, either just for this 
visit to the website, or combined for all visits. 



The observations about a user from different touchpoints 
can be aggregated into a single set. To do this the 
client implementing the filtering system will need to 
10 ensure that identification procedures recognise the user 
no matter what' touchpoint she uses. 



In one preferred embodiment of the filtering engine of 
the invention, the off-line profile engine estimates 
15 item profiles which can be used to generate 
recommendations by the following method. 



Firstly, the profile engine specifies a model for the 
stored dataset . To do this, the following steps are 

2 0 carried out: 

1. Each user i in the dataset (1=1,2, I) is 

associated with a user profile a i# where the set of 
all user profiles is A. 

25 

Each user profile contains Q components, where each 
component is an unobservable metrical variable. The 
number of components can be selected using model 
selection techniques as is described further below. 

3 0 Alternatively, Q can be set at a value that gives a 

reasonable compromise between speed of execution, 
accuracy and intelligability of results (Q = 2 or 3 
would normally be suitable values for such a 
compromise) . 



35 



Each item j in the dataset (j = 1, 2, . . . , J) is 
associated with an item profile b j , where the set of 
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all item profiles is B. Each item profile contains 
Q+l components, 

3. A model h (a*, b j ) is specified that generates a 

predicted observation, h^, for each user i and each 
item j . 

V = h (a l# b 5 ) , j =1, 2, . .., J, 1 = 1, 2, . .., I 
where the set of all predicted observations is ft. 



15 



20 



As an example, suppose that each observation records 
whether or not a user has chosen the object, there are 
no missing observations, and so all values are either 0 
or 1. A common way to model this kind of observation is 
to suppose that the probability that a customer chooses 
an item depends on a constant term that reflects the 
general attractiveness of the item to all customers. It 
also depends on the interaction between the user's 
profile and that of the object. A common specification 
for binary observations of this kind uses the logit 
distribution. 



/7(a,/>') = 



1 if logit " 1 
0 



otherwise 



> 0.5 



where /og/Y~ 1 (x) = 



1 +e 



Once the model has been specified, the item profiles 
(i.e. the model parameter) are estimated so that the set 
of predicted observations, ft, approximates the actual 
set of observations, H. To fit the data, the system 
chooses those parameter values that maximise the 
likelihood of the observed data. 



To do this, the likelihood of the data is first 
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specified by carrying out the following steps: 



Specify the model in terms of a likelihood 
function, f(h|a i# b 3 ) . This gives the probability 
of an observation given the relevant user and 
object profiles. 

kapb J ) = argmax f(h\a p b J ) 
where f(h\a p b^) = Pr(h/ ^h\a,b') 



Thus, in the example 



f(h\a t b) = 





( o > 




logitl 


b 0 + £ Vf 


if h = 1 




\ 9=1 J 






Q 




1 - logit-' 


*>o + £ V* 


if h = 0 




I <7=1 J 





10 



Aggregate across users, and items, and take the 
natural log, to give the loglikelihood of the data, 
LL (H|A, B) . The independence assumption allows 
this to be expressed as: 

LL (H\A,B) = \r\]lf(h\a r bJ) 

u 



Once the likelihood of the data has been specified, the 
item profiles are estimated by choosing the set of item 
profiles B that maximise the likelihood of the observed 
15 data H, conditional on user profiles. This gives the 
equation 

6 = arg max LL(H\A, X) 
x 



20 



The problem with solving this equation is that the user 
profiles A are unobserved. To deal with this, a set of 
estimates for the user profiles are derived via a set of 
pseudo-item profiles. To do this the following steps 
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Use a simple linear model to derive pseudo-item 
profiles. Appropriate examples include the normal 
linear factor model and Principal Component Analysis. 
Thus, one simple linear model that could be used in the 
example is the normal linear factor model. This models 
the data by assuming that, conditional on the user 
profile, observations are random variables with a normal 
distribution. The model also assumes that user profiles 
are independent random variables which are also normally 
distributed: 




and a - N Q (0,/) 

The pseudo-item profiles are then found as those 
parameters, C = (c\ . .., c J ) , and cr j , j = l, . . . , j, 

15 that maximise the likelihood of the data. A number of 
software packages, such as S-PLUS, have pre-programmed 
routines to estimate this model. Often these routines 
will generate C as standardised factor loadings. This 
means that factor loadings are relevant to a model where 

20 the observations about an item are first normalised to 

have unit variance. There is no fixed component, c 0 j , in 
this case. Standardised factor loadings can be used to 
generate estimated user profiles without modification. 

25 A suitable estimate of each user's profile is to use 
what is often referred to in factor analysis as the 
score: 

j 

% = E h H> <i = 1 q 

7=1 

Once the estimates of the user profiles have been 
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obtained, these can be entered into the likelihood 
equation for the data. This leaves only the item 
profiles as free parameters, and they can be estimated 
using well known maximum likelihood or least squares 
techniques . 

B = arg max LL (HlA, X) 



10 



In the example this step leads to a standard logit 
regression model, which is available pre-programmed in 
most statistical packages. 



8 = arg max LL(H\A, X) 
x 



where f(h\a,b) = 





f Q ' 




log it 1 




if h = 1 




) q=i J 




1 - logit ^ 


b n + Y~ a b 

0 Z.^ q q 


if h - 0 




K 9=1 ) 





To choose the number of components Q, estimate the item 
profile for Q = 1, 2 and 3. For each model estimate the 
Akaike Information Criterion, which is given by 

15 

AIC = -2LL (H|A, B) + 2p 

where p is the number of free parameters being estimated 
and is given by: 
20 p = (Q + 1)J 

and where the loglikelihood for the data is found by 
entering the item profiles and the estimated user 
profiles into the predictive model. Choose the value of 
25 Q, that gives the lowest value of the AIC. 

Putting this value of Q back into the equation for the 
item profiles together with the estimated user profiles 
allows values to be obtained for the item profiles using 
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the maximum likelihood techniques described above. The 
item profiles are then used to make recommendations in 
the real-time recommendation engine as will be described 
later. 

5 

Once the item profiles have been estimated, they are 
used to recommend items to a user. Recommendations to a 
user involve 2 steps. However, although not discussed 
here, the two steps could be implemented together by a 
10 single function or piece of code. 

1. Learn about the user's profile from existing 
observations about her. 

15 2 . Use this knowledge about the user profile to make 
predictions about future observations, and base 
recommendations on these predictions. 

Each step is discussed in turn, and for each step there 
20 are two methods which can be used. These are known as 
Approach 1 and Approach 2 respectively. 

Step 1: Learn about the user's profile 

25 Approach 1 (Bayesian) The preferred method is to 
represent knowledge about the user's profile as a 
probability distribution over possible profiles, and to 
use Bayesian inference, combined with the predictive 
model, to generate a posterior distribution a(a|h) by 

30 updating a prior distribution a (a) . Standard results 
give: 

a(a|/7) = <*(*)L(h\a,B) 
Eoi(a)L(h\a,B) 

a 

where L(h\a,B) =J[f(h J \a,b J ) 

1 
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Approach 2 The classical statistical approach which is 
also effective would be to maximise the likelihood of 
the user's observations, given the predictive model and 
the estimated item profiles. 

a = arg max LL(h\X,B) 
x 

where LL(h\X,B) = InllWM') 

J 

5 Step 2 : Make recommendations 

To make recommendations to a user the knowledge of the 
user's profile is combined with the predictive model, 
taking the item profiles as known. This generates 
10 predictions for the user's choices of objects and/or 

ratings of objects. The method depends on what approach 
is being used. 

Approach 1 (Bayesian) In this case knowledge about the 
15 user profile is represented as a distribution over 
possible profiles, a(a|h) and the predictive model 
generates, for each object, a probability distribution 
over possible observations. One method is to use a 
summary statistic for this distribution, the expected 
20 prediction p j (h) for object j. When the observation 
records whether the user has chosen the object or not 
the summary statistic is the probability that it has 
been chosen: 

P'CO = £ f(1\a,b')<x(a\h) 

a 

When the observation records the user's rating for an 
25 object a possible summary statistic is the expected 
rating: 

P' OO = E Ex'(X|a.6')a(a|A) 
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where the dummy variable x is a typical observation 
about item j . 

The actual recommendations will depend on the context 
5 and various commercial considerations, as well as on 
predicted observations. The basic assumption here is 
that it is good to recommend items that it is predicted 
the user would rate highly, or that the user is likely 
to choose. One simple recommendation rule would then be 
10 to recommend the object, which has not yet been chosen, 
with the highest expected prediction, or to recommend 
the object, which has not yet been rated, with the 
highest expected prediction. 

15 Approach 2 In this case knowledge about the user is 

represented as a point estimate for the user profile, a 
and the predictive model generates, for each object, a 
probability distribution over possible observations. 
Using analogous summary statistics to those for Approach 

20 1 topping gives, for observations recording choices: 

pf (h) = f (1|S,Z>') 
and for observations recording ratings: 

& w =Eftf (Ala.* 7 ) 

h 

The same simple recommendation rule suggested for 
Approach 1 is appropriate for Approach 2. 

25 An example of one implementation of the above described 
method is given in Appendix A. 

The method of estimating the item profiles as described 
above can be extended to deal with situations in which 
3 0 it is appropriate to consider items in separate groups 
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with separate sets of user profile components associated 
with each group when deriving the pseudo-item profiles 
and the estimates of the user profiles. This might for 
example be because the dataset contained some items 
5 relating to preferences over objects and some indicators 
of socioeconomic group. By treating these groups 
separately. The number of free parameters that need to 
be estimated for a given number of overall components in 
a user profile is reduced. If the two groups do largely 
10 act as indicators of different components of the user's 
profile then this approach can lead to better estimates 
of the parameters that remain and to more accurate 
predictions . 

15 An example of the method of deriving item profiles, 
showing how to implement the method when the data is 
divided into two classes is given in Appendix B. The 
example does not show recommendations, since the process 
would be exactly the same as for the example above. 

20 Neither is it shown how to derive the number of 

components using the AIC as the method would be the same 
as in the previous example. Here it is assumed there 
will be two components associated with each group of 
items . 

25 

In another alternative embodiment of the method, some 
items can be treated directly as observed components of 
the user profile. This might be appropriate for items 
such as user age which are exogenous, in other words 
30 they are causes of other aspects of the user's 

observations rather than being the result of other 
hidden variables. 

The example in Appendix C is an example showing how to 
35 implement the method when using exogenous data. The 

example does not show recommendations, since the process 
would be exactly the same as for the example of the 
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basic method. Neither is it shown how to derive the 
number of components using the AIC as the method would 
be the same as in the previous example. Here it is 
assumed there will be two components. 

5 

In an alternative embodiment of the method of the 
invention, point estimates of the parameters making up 
the case and item profiles are obtained. To do this a 
database is obtained which consists of user histories h 

10 for a set of users indexed 1, 2, i ; a set of user 

profiles, a, one for each user, a = (a x , a 2/ a x ) ; a 

set of object profiles, b, one for each object, b = (b 1# 
t> 2 , kj) ; ar * estimation function H(ai, bj) , and a 

recommendation function R(ai, bj) with the properties 

15 that : 



The user history for user i, hi = (h^, h d 2 , ... hf) , 
records the available information about that user's 
scores for the objects, so that h^ is user i's score for 
object j. For each user the dataset may contain 
information on only some objects. Scores can be 
discrete, categorical or ordinal, and in particular may 
be binary, or continuous. What the scores represent 
depends on the context, but examples include the user's 
enjoyment of the object, or a binary variable indicating 
whether the user has sampled that particular object or 
not . 



Function R(a i/ b j ) / uses user i's profile ai, and object 
30 j's profile b 6 , to rate object j for user i, if the 

database does not record i f s score of j. 

Recommendations about whether user I should sample 

object j can be based either on the outcome of R(., .) 

alone, or on a comparison for R(., .) for a set of 
35 different objects. 

User i's profile and object j's profile are chosen so 
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that H(A ll ,B 3 .) is a good estimate of user i's score for 
object j, if that score is already in the database, for 
all users i and objects j taken together. 

5 H(.,.) and R(.,.) can estimate histories and provide 

recommendations for hypothetical user profiles and for 
hypothetical object profiles. 

In the operation of the offline profile generator the 
10 followings steps are undertaken: 

a) the current database of user histories, h, the 
existing matrix of user profiles a (if recorded) and a 
matrix of object profiles b, and the recommendation 

15 function H(.,.) are inputted; 

b) the matrix is updated, choosing (a,b) so that the 
history model H(.,.) estimates the user history. The 
existing matrix may act as the initial point of a 

20 numerical algorithm. 

c) the updated matrix of object profiles, b, and, if 
recorded, the user profiles, a is outputted. 

25 The real time recommendation engine is then operated as 
follows : 

a) the user id is inputted, the user history from the 
database h is looked up and, if user profiles are 

30 recorded, the current user profile from the database a 
is looked up. The subset of objects that are to be 
rated; the object profile database b; the rating 
function R( . , . ) ; the estimation function H( . , . ) ; and an 
indication of whether the user profile needs to be 

35 recalculated are inputted. 

b) If the user history has changed since last visit, 
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or if user profiles are not recorded, then the user 
profile a* is updated. a £ is chosen so that H(a A ,b) 
estimates the user history hi. If appropriate, the old 
user profile is used as a starting point for the 
5 algorithm that updates a A . Thus, the system determines 
whether or not the user history has changed since last 
accessing the filtering system. If yes, the user 
profile ai is calculated and recorded. If not then the 
user profile a A is simply looked up. 

10 

c) For each object in the subset the rating is then 
calculated according to R (.,.), using the user's profile 
and the object profile as parameters. 

!5 d) The list of ratings is then outputted. These will 
form the basis of the recommendations to the user. 

e) If user profiles are recorded in the system, the 
updated user profile sl x is saved. 

20 

In one preferred embodiment of the invention an 
Unobserved Attribute Model (UAM) is used for the 
estimation function H ( . , . ) . 

25 A UAM starts from the assumption that users and objects 
can be described by vectors that list their level of 
each of a number of (unobservable) characteristics, 
where the number of characteristics is less than some 
fixed limit. For example a d x would give user i's level 

30 of characteristic x. , and b/ would give object j's level 
of characteristic y. 

These characteristics together determine the 
observations in the user-history data-base. An example 
35 would be where data base holds information on whether a 
user has been to a London visitor attraction or not. 
Assume that the probability that user i has visited 
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attraction j is (j)^ 1 + bj 1 + Lla^-b^l ) , for some 

x=2 

probability distribution <J>. Here the user would be more 
5 likely to visit the attraction if the characteristics 
for which she has a high score are the same as the 
characteristics for which the attraction has a high 
score. There is also an allowance for the possibility 
that the user is more likely than most to visit any 
10 attraction, and that this is a particularly popular 
attraction. This kind of model assumes that users 
'care' about some factors more than others, and make 
their decisions based on whether or not the factor they 
care about is present. 

15 

Another example of a plausible model would be if the 
probability that user i has visited attraction j is 

X 

given by 4> (a/ + bj 1 + Zl a i X -bj X | ) . , for some probability 

2 0 x=2 

distribution Here users want to go to the place that 

most closely matches their own preferences. So if a 
user's rating for characteristic 3 was low, she would 
prefer to visit attractions which also had a low rating 
25 for characteristic 3, other things being equal. 

One general approach to deriving a UAM is to set up a 
likelihood function that outputs the likelihood of the 
observed history, given the current estimate of the user 

3 0 profiles and object profiles, and then to choose those 

user and object profiles that maximise the likelihood of 
the observed history. 

The likelihood functions would be maximised according to 
35 the methods known in the art. Sources which describe 
these known maximisation methods include "Maximum 
Likelihood Estimation with STATA" by W. Gould & W. 
Sribney. Pub. Stata Press, College Station, Texas. 1999. 
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An alternative approach might be to use genetic 
algorithms . 

The preferred embodiment, however, exploits the 
particular structure of the data base, which can be seen 
either as a set of user histories, recording how each 
user scored the objects, or as a set of object 
histories, recording how each object was scored by 
users . 

This structure" suggests that an iterative procedure can 
be used to derive the user and object profiles that 
maximise the likelihood of the observed data. Each 
iteration comes in two parts. In the first the current 
object profile estimates are held constant, while the 
user profiles are updated to record those that maximise 
the likelihood of the data, given the object profiles. 
In the second part the user profiles are held constant 
while the object profiles are updated to record those 
profiles that maximise the likelihood of the data, given 
the user profiles. 

Any convergence point of this iterative algorithm will 
maximise the likelihood of the observed data. This 
25 method to derive a UAM is described below. 

To initialise the algorithm: 

a) Firstly, a likelihood function P(h|a,b) is set up 
30 that gives the likelihood of observing history h, given 
user profiles a and object profiles b. The likelihood 
of an element of the database is assumed to be an 
independent random variable, given the profiles of the 
object and user. The likelihood of the data as a whole 
35 can therefore be written as 



10 



15 
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/ J 

P(h\a,b) = nn f(h- n \a it b) 

The function should be chosen bearing in mind that the 
estimate of the history, H(a,b), takes the same 
arguments as the likelihood function. 

5 From the likelihood function, two sets of loglikelihood 
functions are defined, one for the user profiles as a 
function of known item profiles, which is: 

L(a.\B) = In II f(hAa,b) 
J 

= £ In^ia,*),) 

7=1 

and one for the item profiles as a function of known 
user profiles, which is: 

L(bj\A) =£lnf(/>, 7 |a,Z>.) 

10 Then, for each item j , an initial value for the item 
profile, b 0 .} is defined. As an example the initial 
values could be random variables. 

Alternatively the current object profiles, from the 
15 previous estimation of the UAM, could be used as the 
starting point. 

For each user i an initial value for the user profile, 
a°i is defined. As an example these could be the current 
20 user profiles. 

Once the algorithm has been initialised, it must be 
converged by an iterative process comprising the 
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following steps: 

a) User profiles A t+1 = ( ai t+1 , . .., ai t+1 ) are then 
chosen to maximise the loglikelihood of the user 
5 profiles as a function of known item profiles B t 

ai t+1 = arg max L(a i |B t ) 

10 b) Object profiles B t+1 are chosen to maximise the 
loglikelihood of the item profiles as a function of 
known user profiles A t+1 . 

bj t+1 = arg max L(b j )A t+1 ) 
15 bj 

The steps a and b are then repeated until there is 
convergance in the values found, at which point the 
values of the user and item profiles found are taken as 
20 the solution to the function. 

One way of determining whether or not the item and user 
profiles have converged sufficiently is to calculate the 
loglikelihood of the data (i.e. the value of Lfb^A) and 
25 to consider there to have been sufficient convergance if 
the percentage fall in the loglikelihood is less than 
some pre-set value, such as 0.1. 

It would be apparent to someone skilled in the art that 
3 0 the number of parameters in an item or user profile can 
be varied by changing the specification of H and L, and 
that the optimal number can be chosen to balance 
requirements that the algorithm not use too much 
processing power or storage, and that it gives accurate 
3 5 recommendations. A further important factor is to avoid 
overfitting of the data. 

In a further preferred embodiment of a filtering engine 
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according to the invention, bias in the user history- 
data is corrected for. The information held in the user 
history database can take a number of different forms. 
It could hold whether or not the user has sampled an 
5 item, or how the user rated an item if sampled. The 

information may also be incomplete in the sense that the 
user may have sampled an object, but not entered its 
score into the database. 

10 This means there are at least two potential sources of 
selection bias. The first is that users will only have 
sampled some of the objects. The second is that users 
may not have entered into the database all the objects 
they have sampled. In many cases users will be more 

15 likely to sample objects that they are likely to rate 
highly. They may also be more likely to enter 
information about objects they liked. The effect is 
that estimates of ratings based on standard statistical 
analysis of the database of user histories will estimate 

2 0 the ratings conditional on whether an object has been 

sampled and recorded. The estimated conditional ratings 
may be biased (inaccurate) estimates of the underlying 
unconditional ratings . 

25 In a still further embodiment of a filtering system 

according to the invention, a maximum likelihood method 
is used. The data records whether an item has been 
sampled or not and, if sampled, what the rating was. 

30 LQi\aJ>) =nL(h\\a p b) 

/ 

is the likelihood of observing h. Choose a and b to 
maximise this. 

35 The following is a simple numerical example showing how 
a method according to the invention might operate in 
practice. As will be apparent, in the method described 
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below, the function modelling the data is solved using 
an unobserved attribute model (UAM) . 

In this example, the history data set records whether or 
not users have visited each of four attractions in the 
South East of England. In the example there are four 
users, and their histories are given in the following 
table. 



Table I - History h 





Brighton 


National 
Gallery 


Natural 
History 
Museum 


Legoland 


Alice 


1 


0 


1 


0 


Ben 


0 


1 


1 


0 


Carl 


1 


1 


1 


0 


Dan 


1 


0 


0 


1 



The likelihood function for the observed history assumes 
that whether or not a user has visited an attraction is 
an independent random variable, conditional on the 
user's profile. The likelihood function for whether 
user i has visited attraction j is: 



L(h f ) = max{0 f min{1 l e 1 , 6/ + a^/}} if h fJ = 1 
1 -max{0,mfn{1 l a 1 'A 1 J + a^}} if h r = 0 



and the overall likelihood of h is: 

HL(h,) 

u 

For simplicity user and object profiles are restricted 
to belong to a set of discrete values, and the largest 
25 value for each parameter in the object profile is 
restricted to be equal to 1. 
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a ' €{0,0.25,0.5,0.75,1} / = 1,2 
b J €{0,0.25,0.5,0.75,1} y = 1,2 
maxb x ' = 1 x = 1,2 

a'' €{0,0.25,0.5,0.75,1} / = 1,2 
6' €{0,0.25,0.5,0.75,1} y = 1.2 
maxf> x y = 1 x = 1,2 

X 

Choosing object and user profiles to maximise the 
likelihood yields, as one solution: 



10 





Table 2 


- User profiles 






al 


a2 




Alice 


0.5 


0.5 




Ben 


1 


0 




Carl 


1 


0.5 




Dan 


0 


1 




Table 3 


- Object Profiles 




bl 


b2 


Brighton 


0.5 


1 


National 


1 


0 


Gallery 






Natural 


1 


0.25 


History 






Museum 






Legoland 


0 


0.75 



The example was implemented using an excell worksheet. 
Initial values of all parameters were set to 0.5. Each 
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parameter was in its own cell. The likehihood of the 
data was entered as a formula into a separate cell, 
taking the parameter as arguments. The likelihood 
function was then maximised by iterating manually 
5 through the following steps. 

Holding all other parameters constant, try all 
possible combinations of the two parameters 
relating to Alice. Retain that combination that 
maximises the likelihood. 

i 

Do likewise for Ben, Carl and Dan in turn. 

Holding all other parameters constant, try all 
possible combinations of the two parameters 
relating to Brighton. Retain that combination that 
maximises the likelihood. 

Do likewise for the National Gallery, Natural 
History Museum and Legoland in turn. 

Have any parameters changed? If yes then go back 
to step 1. If no then stop. 
Once a solution has been obtained, the user and object 
profiles for user i and object j can then be substituted 
back into the function L(h ld ) to predict the likelihood 
of user i wanting to visit object or attraction j if 
they have not already done so. 

In one example, the function R could be determined as 
follows. If it is assumed that people are more likely 
to visit attractions they will enjoy then an example for 
the recommendation function R would be to base R on the 
likelihood function L. Let Rfa^bj) =L (h^la^bj) for those 
attractions that user I has not visited (hi j =0) and set 
R(a i ,b j )=0 for those it has visited. if it is proposed 
to recommend one attraction to user i then it should be 



10 



15 



20 
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to visit the attraction for which R(a ir .) is largest. 

In this example the data only indicates whether a user 
has visited an attraction or not. In an alternative 
5 embodiment the data holds ratings which indicate, for 

those attractions which the user has visited and entered 
information for, how much they enjoyed them. The 
ratings held in the database are conditional on the user 
having visited the attraction and having entered 

10 information into the database. In these cases the 
likelihood function and the history function that 
estimated the condition ratings could be based on a 
combination of two other functions - one that estimated 
whether any rating on an attraction was held, and one 

15 that estimated the unconditional rating. The 

recommendation function would then be based on the 
estimated unconditional rating function. The simplest 
case is to assume that whether a rating is held is 
random when compared to the rating itself, so that the 

20 unconditional rating is the same as the conditional 

rating. In this case the recommendation function will 
be directly related to the estimation function and there 
is no need to correct for selection bias. 

25 The function H could be determined in many ways. The 
function models the data as a function of user and 
object profiles. H is an explicit model of how the data 
is generated in terms of the way that users make 
choices. 

30 

To take some particular cases, in one embodiment the 
data might record 1 if the user has both sampled the 
object and recorded a vote, and 0 otherwise. Given the 
type of objects in the database a good model of the data 
35 might assume that users are more likely to sample and 

record votes for objects that are suitable, and that an 
object is more likely to be suitable if its profile is 
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similar to the user's profile. So H will be a model of 
the probability of sampling and recording as a function 
of a distance between the user and object profiles, for 
some distance metric. Then the profiles are chosen to 
5 maximise the fit between what H predicts and the actual 
data. In this case R would be the same as H because 
there is no other information available about 
suitability other than the assumption that users are 
more likely to select more suitable objects. 

10 

In another embpdiment , the data records a user's rating 
from 1 to 10 of an object if it has both sampled the 
object and recorded information on it. Given the type 
of object a good model of the data might assume that 

15 users are more likely to sample and record votes for 
objects that are suitable, but that sampling and 
recording depend on other things as well, and that 
suitability depends on the extent to which the user and 
the object both have high levels of the same 

20 characteristics. In this case one approach would be for 
H to be a combination of: 

1 . a model of those votes where information on 
suitability was recorded as a model of suitability 

25 conditional on' sampling and recording, and 

2 . a model whether a vote was recorded or not as a 
separate model of sampling and recording. 

3 0 Both could take the inner product of the user and object 
profiles as parameters. 

It might be better however if H was based on a model of 
the suitability unconditional on sampling and recording- 
3 5 One way to do this would be to use an estimation 
procedure that corrected for selection bias . An 
alternative might be to estimate in one go a single 
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function that was the product of a selection equation 
and a suitability equation. If however there was no 
correlation between selection and suitability then there 
would be no need to correct for selection bias . The 
5 best model will depend on the data. 

This method can be implemented using known techniques 
for correcting for selection bias in the F module (where 
case profiles are treated as known and the goal is to 

10 estimate the item profiles) such as Heckman regression. 
An example (i) the unconditional rating is modelled as 
being linearly related to the case profile, where the 
coefficients are components of the item profile (ii) 
selection (or sampling) is modelled using a logit model 

15 where the parameter that enters the inverse logit 

function is linearly related to the case profile, and 
where the coefficients are components of the item 
profile (iii) all components in the case profiles enter 
into the model of selection and at least one component 

20 of a case profile does not enter into the model of 

ratings and (iv) the components of the item profile that 
enter into the selection model are different from those 
that enter into the model of unconditional observations. 
The Heckman regression is well known and is available 

25 preprogrammed for a number of specific functional forms, 
including the ones mentioned above, in the STATA 
statistical package. 

Recommendations would be based on the unconditional 
30 suitability, and so, depending on the modelling choices 
made, could differ from estimates of H. 

Figure 2 shows a frame within a page of the website 
according to the invention. This website could use any 
35 of the various filtering methods according to the 

invention as described herein. The web page contains a 
frame into which the user inputs data relating to their 
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preferences as well as the frame shown in Figure 2. 
This frame 2 includes a list 4 of the top five objects 
which the user is most likely to prefer. Also included 
in the frame is a personalisation sliding scale 6 which 
5 indicates to the user the degree of personalisation of 
the recommendations which they are provided with. As 
shown, the scale indicates the degree of personalisation 
as a score in the range of 0 to 100%. Each time that 
the user inputs a new piece of data, the recommendation 

10 provided will be updated and the personalisation score 
will also be updated. Although not shown in Figure 2, 
the recommendations provided to the user are displayed 
on the same web page as the personalisation slilding 
scale thus providing the user with a motivation for 

15 inputting more data about themselves. 

In a further alternative embodiment of the invention, 
the off-line profile engine operates as follows: 

20 l. Receive the set of user histories 

H={h% (A) 

2. Receive a likelihood function for the user 
histories : 

V(H\A 9 B) = n c£(/7 | a B) = n,n^ | a \b ) (B) 
The arguments of the likelihood function are: 
A set of user profiles A ={a '},. 
A set of user profiles B ={b J }j 

The way in which the likelihood function is derived for 
a particular set of user histories is described in the 
25 examples which follow. 
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3. Maximise the likelihood function by an iterative 
process in order to solve it to obtain the object and 
user profiles 

A\B' . argmax££(H i,4,B) (C) 

4. Use the set of point estimates of the user profiles 
(one for each user in the history database) to generate 
a prior distribution ot° over possible user profiles, A 

a°(a)=f(a I A); aeA (D) 

where the user profiles for each user in the history 
database {a x } t are represented by A. 

The real-time Bayesian recommendation engine is then 
operated as follows: 

1. Information about a particular user's history is 
received into the recommendation engine 

2. A prior probability distribution over possible 
profiles for the user a 0 , 

a point estimate of profiles for each item 

B = {b j } j7 and 

a likelihood function for histories 

2{h\a,B) = ri J L h {h J \a t b f ) 

are received from the off-line profile engine 
25 3 . A posterior probability distribution over possible 



15 



20 
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profiles is generated for the user by updating the prior 
probability distribution in the light of data using 
Bayesian inference and the likelihood function. 

ql(j) _ q°(a)c£(ft'la,B) 
E a a°(a)2(/>'|a,B) 

4. A point estimate of profiles for each item 

B = {b j } j; and 
a likelihood function for ratings. 
10 L r (r|a,b j ) 

are received from the off-line profile generator. 

5. A probability distribution over possible ratings 
15 for items (for which there are no votes) is generated 

using the likelihood function and integrating over 
possible profiles. 

(Vla'.M - W^Oj 
Z r L r {r\a,b*) 

6. A point estimate of the likely rating for each item 
is generated using the probability distribution over 
possible ratings for each item obtained at 5. 



20 



7. The point estimate of the likely rating is used to 
output information to the user in the required form. 

25 The functioning of the off-line profile engine and the 
on-line Bayesian recommendation engine have been 
described above in terms of the space of allowable user 
profiles being discrete. However, as would be apparent 
to the skilled person, the modules could be modified to 
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allow for a continuous space of allowable profiles. 



In an alternative mode of filtering data to provide 
recommendations to a user, the user and object profiles 
5 obtained are used together with the user profile for the 
user requiring a recommendation to estimate the 
preferences of that use for a plurality of objects. An 
example of such a filtering method is given below. It 
will be appreciated that the iterative method by which 
10 the likelihood function modelling the data set was 
solved in this example is equally applicable to the 
solution of the likelihood function in the off-line 
profile engine of the present invention. 

15 This example was implemented using the S-PLUS 
statistical software package. 

In the examples there are 20 users and 5 objects. The 
data is binary and complete, so that every h i3 is either 
20 1 or 0. h^ is equal to 1 if and only if user i has 

sampled object j. The aim of the filter in this case is 
to model the process that has generated user sampling 
choices so far. 

25 Recommendations are based on identifying those items 
that the user is most likely to sample next. The 
recommendation function in this case is the estimated 
probability that the particular user has sampled the~ 
particular item. It is assumed that the task is to 

30 recommend to a new user which single item she should 

sample next. The recommendation is to sample that, as 
yet unsampled, item to which the model assigns the 
highest probability. 

35 The likelihood function L is defined via a scoring 
function s ( . , . ) that models the probability that a 
particular item has been sampled by a particular user. 



WO 02/10954 



PCT/GB01/03383 



- 65 

The full definitions are: 



L(h\a b) = i s(a ' 6) if ft=1 
<-</>! a. 6) \,_ s(ab) ifh=Q 



where 

$:R 2 xR 2 -'R, (a,b)^0(<a,b>) 



0:R~R, x~ ■ 

1 +exp(-4(x-0.5)) 



and < a,b > is the inner product of the vectors a and b. 



The history function H(a,b) is taken as the most likely 
outcome given the estimated parameters, so that: 

H : R 2 xR 2 -0,1, (a t b)-+maxL(h\a,b) 

M0,1} 



The dataset is complete and the recommendation function 
is just the scoring function: 

10 R{.,> = s(., .) . 

It is assumed that each user and object is associated 
with a^ vector of two parameters . We have sought to find 
parameters for the users and objects that maximise the 

15 overall likelihood of the data using an iterative 
procedure as described herein. Parameters were 
restricted to lie between 0 and 1. Initial values for 
all parameters were chosen at random. At each iteration 
the current value was replaced with a linear combination 

20 of the current value and whatever value maximised the 
likelihood (in practice we used the natural log of the 
likelihood as likelihood itself was too small) holding 
parameters for all other places or users constant. 
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Iterations continued until the improvement in the log- 
likelihood between successive iterations was less than a 
specified tolerance. In the examples the tolerance was 
set at 0.01, i.e. a one percent improvement. 

5 

We followed the iterative procedure three different 
. times using a different set of initial conditions each 
time. Of these runs two appear to converge on a similar 
maximum, giving similar values for the likelihood and 
10 similar values for the parameters. The likelihood for 
these two was slightly higher than for the other run. 
All three appear to be good approximations to parameters 
that maximise the likelihood. 

15 Once each run had converged we calculated the history 
function and gave a recommendation for a new user. All 
three sets of profiles gave the same recommendation. 

In this example we used the iterative procedure to 
20 arrive at three sets of profiles, each of which appear 
to be good approximations to parameters that maximise 
the likelihood. Someone skilled in the art would be 
able to arrive at a single preferred approximation using 
a number of methods, for example running the iterative 
25 procedure a fixed number of times and choosing those 
profiles that gave the highest likelihood. 

There are three appendices accompanying this example. 
The first (Appendix D) defines the functions. The 
30 second (Appendix E) gives a complete session log for the 
first of the three runs. The third (Appendix F) 
summarises the results for each of the three runs. 

The structure of the user history data set obtained in 
35 the filtering method of the invention may take various 
forms. Two alternative embodiments of the invention 
using different forms of data are set out below. 
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In the first embodiment, the data records whether or not 
a user has sampled an item, or whether or not the user 
has recorded sampling an item. The data is complete. 

In this case there is no distinction between ratings and 
histories . 

h u = r (i = ( 1 if the user has sampled item j 
|p otherwise 

Alternatively :, 

h u = r i/ = |1 if the user has recorded that she has sampled item j 
lo otherwise 



Because histories and ratings are the same, the 
likelihood functions for the two are the same. 

L h (h*\aM=l-'(h J \a t bJ) 



10 In the second embodiment, the data records user 

preferences over items. The data is incomplete, in that 
each user has recorded preferences for only a subset of 
the available item. 

15 Each element of data is the product of two variables. 

The sample variable s ij records whether a particular user 
has recorded a rating for item j . 



if the user has visited attraction j 
otherwise 



The rating variable r ij records the user's rating for 
attraction j . 

The user's history for attraction j is the product of 
these two variables. 
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h ij = s ij r ij 



10 



15 



In general there will be selection bias - users will be 
more likely to give ratings for items they rate highly. 
If so then a user's selections are informative about how 
they would rate currently unrated items. 

To capture this information the likelihood that a user 
selects a particular item is modelled as a function of 
the user and object profiles and it is assumed that, 
conditional on profiles, selection and rating are 
independent. This independence assumption means the 
likelihood of the history can be decomposed as follows. 



The following is a specific example of an application of 
the filtering method of the invention. 

Data records user preferences over some London area 
attractions from a set of available alternatives. Each 
element of data is the product of two variables. The 
sample variable s 3 records whether a particular user has 
been to attraction j . 



The rating variable r lj records whether the user likes 
attraction j or not. 





if the user has visited attraction j 
otherwise 




if the user likes the attraction 
if the user does not like it 



The user's history for attraction j is the product of 
25 these two variables. 
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h ij = s ij r ij 

The information on ratings will be incomplete as users 
will only record ratings for attractions they have 
5 visited. The definitions are nevertheless complete 

since h 13 =0 for unvisited attractions, whatever value r ij 
takes . 

Each user and object profile is made up of three 
10 attributes. The first user attribute determines the 
distribution of s ij . The first item attribute has no 
effect and is set to 0. The second and third attributes 
from the profiles together determine the distribution 
for r ij . 

a =(a v a 2 ,a 3 ) 

15 Prior beliefs about a user's profile are generated by 

taking an average over the profiles of all other users. 

a«(a)=„a.A>=i*±f> 
N 

where N is the number of users 



20 



•t 



and | (a '-a) = ft if J f' =a 

otherwise 



The likelihood functions for histories and ratings are 
related. Conditional on the user and item profiles, the 
probability that a user has sampled item j and the 
user's rating for that item are independent. 

L\ h J\aM = \ LS }°\ a ' b \ „ . iisJ = 0 
[L s (1 \a,b*)L if s' = 1 



The probability of sampling each item is independent of 
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the object profiles and is constant across objects. The 
probability for each item differs across users and is 
given by the first attribute of the user profile. 

L*(sJ\aM = ( a1 , if s . 7 = 1 
1 ' [ 1 -a 1 if s' = 0 



The probability that the user likes an item is an 
5 increasing function of the inner product of the user's 
profile and the profile of the item, ignoring the first 
attributes . 

- \ 9(a ' bJ) ■ ifr ' =2 

A 

where gia.b^ 

1+exp(^(a^ a i + a a l>, i -0.5)) 



In this example there is no overlap between the 
attributes that affect selection and those that affect 
10 rating. The consequence of this is that selection and 
rating are independent, even without conditioning on 
profiles. This feature allows a simplification. 

When estimating the profile of the user requesting a 
15 recommendation we can, in effect, treat profiles as 
containing just the last two attributes, and use the 
likelihood function for ratings in place of the more 
complex likelihood function for histories. 



20 



The likelihood function used would be: 



L h (h i \a M b i ) = 



1 ifs' = 0 
L r (r y |a,M ifs y =1 



The recommendation task is to identify the three 
attractions which the user has not yet visited and which 
she is most likely to like. To derive a point estimate 
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of the likely rating for each item assume that the 
numerical ratings themselves are meaningful so that we 
can use the expectation of the ratings for an item as 
our estimate. 

r ei =EH=E f rUr) 

Identify those three items with the highest estimated 
ratings, and which the user has not yet sampled, and 
output an identifier for them. 

The profile engine treats the item profiles as unknown 
parameters and estimates them to fit the user histories 
in the database. 



A standard statistical procedure for estimating unknown 
parameters is to choose those parameters that maximise 

15 the likelihood of the data being present. However, in 
the embodiment of the method described below, the 
profile engine models the likelihood of the data being 
present as a function depending on some hidden variables 
(the user profiles) . Thus, to solve the function, the 

20 hidden variables are represented by a distribution over 
possible values and the likelihood of the data is then 
maximised when the expectation is taken over the 
distribution. It will be appreciated that this is the 
approach to estimation used in latent variable analysis 

25 which is a known statistical technique. 

The following defines the notation used in the 
description of the profile engine. 

30 As discussed above r a database of user histories is 
input to the profile engine. Each user history 
comprises a set of observations that record what is 
known about the user's actions and preferences. 
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The set of users in the database is denoted by: 
I = {1, 2 I}. 

The set of items in the database is denoted by: 
5 J = {l, 2 . . . , j} . 

An observation about item j and user i is denoted as h^. 

The set of all user histories in the database is denoted 
10 by H = [h lt h 2 , . h x } where a user history is the set 
of all observations for a particular user (user i) and 
is denoted by: h x = {h A \ h* 2 , h/} . 

If data for a user were showing whether or not they had 
15 been to Greece then allowable values for Greece (the 

item) would be true, false or missing. Alternatively, 
if data were collated showing the age of a user, then 
the item could have any integer value or could be 
missing. 

20 

In addition to the database of user histories, a 
function which models the loglikelihood of the user 
histories in the database LL(H|B) is also input to the 
profile engine. This function returns the likelihood of 
25 a set of user histories as a function of given item 

profiles and a probability distribution over possible 
user profiles. Thus, user profiles are not observed by 
this function, and knowledge about them is represented 
as a probability distribution over possible profiles. 

30 

The loglikelihood function is a function of a set of 
user histories H and a set of item profiles B. The user 
profiles are assumed to be drawn from a set of possible 
profiles. Each user profile is a vector of components. 

35 

In the user profile notation Q a is the number of 
components in a user profile, A is the set of possible 
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user profiles, and a = {a 1# a 2/ . .., a Qa } is a typical 
element of A. 

As discussed above, the loglikelihood function uses an 
assumed prior distribution over user profiles in the 
data set. The prior probability that a user's profile 
is a is denoted as a (a) . 

The prior probability in latent variable analysis would 
normally derive from the assumption that each component 
in the user profile is distributed as standard normal 
and the components are independent. However, it has 
been shown by past research that the actual prior 
distribution assumed in latent trait analysis has little 
effect on the results obtained. Changes in the mean and 
variance of the assumed distribution would lead to a 
translation of the estimated item profiles that however 
would not affect the fit of the data model or of a 
prediction obtained using them. Empirical tests have 
shown that the form of the distribution has only a small 
effect on the results of latent variable models. 

The profile engine of the present invention is described 
here in discrete form and so the prior distribution used 
for each component, a q (a) is a discrete approximation to 
a standard normal distribution. 

To simplify the exposition, "the loglikelihood function 
is expressed in terms of a likelihood of a user history, 
L(h|B,a), and that in turn is expressed in terms of the 
likelihood of an observation, f (h j |a,b) . 

The function f (h j |a,b) gives the likelihood of 
observation h? about a particular item and user, given 
3 5 that the item profile is given by b and the user's 
profile is given by a. 



15 
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In a preferred embodiment of the profile engine for 
binary data, all items are binary variables which take 
either value 0 or 1 or missing, or equivalently are 
either true or false or missing. An example is where 
each item is a possible action, such as "watch Titanic" 
and the user history records whether the user has taken 
each action, or whether no information is available on 
the action. The likelihood that a variable is TRUE is 
given by the logit function, where the argument depends 
on the item and user profile as: 



f(h J \a,b) = 



logir' (b 0 + £ ab ) ir*'=i 



1- log,!' 1 (b 0 + £ a q b q ) ifh^Q 
1 if h 1 = • 



where logit" 1 (x) = 1/(1 + exp(-x)) and h j = • means that 
the observation is missing. 

15 The logit function is commonly used in regression models 
where the goal is to model the variants of a binary 
variable . 



Once f (h^ajb) has been defined, this can be used in the 
20 iikelihood of a user, history given a set of item 

profiles and a user profile. The likelihood of user 
history h given that the item profiles are given by B 
and the user's profile is a is: L(h|a, B) . To derive the 
expected likelihood of the set of user histories, it is 
25 assumed that the user and item profiles contain all the 
information which is needed to predict the observation 
so that the likelihood of each observation is 
conditionally independent, given the item and user 
profiles. As a result, the likelihood of a user's 



WO 02/10954 



PCT/GB01/03383 



- 75 - 

history is the product of the likelihood of each 
observation, i.e. 

L(h\a,B) = n f(h j \a<bi) 

From the likelihood of a user history, the expected 
loglikelihood of the set of user histories can be found. 
5 The loglikelihood, LL(H|B) = InL (H|B) , where L(H|B) is 
the expected likelihood of the set of user histories 
given the item profiles. To derive the expected 
likelihood of a set of user histories it is assumed that 
the user and item profiles contain everything needed to 

10 predict the observation, so that the likelihood of each 
observation is conditionally independent, given the item 
and user profiles. As a result, the likelihood of a 
user's history is the product of the likelihood of each 
observation, and the likelihood of all histories is the 

15 product of the likelihood of each user's history. 
Thus: 

L(h\B) = n £ M/i,|a,B)oc(a) 

lei B€A 

giving a loglikelihood of: 

LL(H\B) = £ In £ U/7.|a,B)a(a) 

' fe/ a<=A 

It will be appreciated that in the profile engine method 
described it is assumed that one observation is" made per 
20 item. It would of course be possible however to modify 
the profile engine for situations in which more than one 
observation were made and it would be apparent to a man 
skilled in the art how to do this. 

25 In addition, the profile engine described is set up to 
handle attendance data in which each observation has a 
value of either 0 or 1. Such a data structure would 
arise when items were movies or places for example and 
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the data recorded whether or not a user had visited an 
item. 

The profile engine could however be modified to deal 
with other types of data and again, it would be apparent 
to one skilled in the art how to do this. 



The database of user histories and the loglikelihood 
function defined above are input to the profile engine 

10 in use and the loglikelihood function is solved to find 
the item profiles which maximise the function for the 
data set . Each item profile found is a vector of 
components defining characteristics of an item. The 
profile engine specifies the number of vector components 

15 to be included in each item profile. 

When choosing the number of components in a user 
profile, there are two effects which need to be 
balanced. Increasing the number of vector components 

2 0 will increase the number of parameters that are 

estimated by the item profile engine. On the one hand 
this will give the model greater scope to fit complex 
relationships between the variables and improve its 
ability to predict behaviour out of sample. On the 

25 other hand it will also increase the scope of the model 
to fit idiosyncratic features of the data which are not 
seen in out-of -sample cases. This will harm the model's 
ability to make good predictions. 

30 One method which can be used to balance these two 

effects in order to select the model that gives the best 
predictions is the Akaike Information Criterion (the 
AIC) . The method looks for the model that maximises a 
measure of the likelihood of the data, but subject to a 

35 penalty term that increases as the number of parameters 
increases. More precisely, if B is the set of item 
profiles that maximises the expected likelihood, and p 
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is the number of parameters, then the AIC is: 

-2LL (H|B) + 2p 

5 The selection rule is to choose the model that minimises 
the AIC . 

In the present method, the parameters in the model are 
the item profiles. Each item profile is a list of Q+i 
10 numbers, where Q is the number of components in a user 
profile. Selecting on the basis of the AIC leads to 
Q = argmin - 2LL(H|B) + 2 (X + 1)J 
X 

15 where B is the set of item profiles that maximise the 
expected loglikelihood of the data. 

In practice, other considerations militate against 
having a large number of components. A large number of 

2 0 components means that the complexity of the user profile 
is greater, and this can slow down the process of making 
recommendations. In some contexts, an administrator may 
wish to attach meanings -to the components and this will 
be harder if there are many components . The following 

25 procedure is therefore carried out in practice: 

1. Estimate the model with Q = 1, 2 and 3. 

2. Estimate the AIC for each number of components. 

3. Select the model with the lowest AIC. 

30 

In an alternative embodiment, no balancing method is 
carried out and the number of components is set at 2. 
Experiments suggest that in many cases the predictive 
performance of a model with 2 components is good 
35 although not perfect. The main advantage of using such 
a small number of components is that it is easy to 
display the resulting item profiles graphically, which 
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is beneficial in cases where the administrator of the 
system wants to have an intuitive indication of the 
basis of the engine's recommendations. 

5 The item profile for item j is denoted by b j = (b Q j , b 2 j , 
. . . , b Q j ) where Q +1 is the number of components in the 
item profile and b Q j is the value of component Q of the 
profile for item j. The set of item profiles, B is 
denoted by B = {b\ b 2 , . b J } . 

10 

In a preferred embodiment, the functions in the item 
profile engine are set up such that Q a = Q which means 
that the number of components in a user profile is one 
less than the number of components in an item profile. 

15 

The item profiles are estimated as those parameters that 
maximise the history loglikelihood function. 

i.e. B = argmax x LL (H|X) 

20 

A discussion of appropriate methods of solving equations 
of this type which arise in latent variable analysis is 
to be found in "Latent Variable Models and Factor 
Analysis" , by David Bartholomew and Martin Knott, Publ . 

25 Arnold 1999. Particular methods of solving a functional 
form of the equation for B which arises when attendance 
data is analysed are described by Bartholomew and Knot 
at sections 4.5-4.13 of their book. In the preferred 
method of solving for B, a program known as TWOMIS and 

30 referred to in the book which uses the EM algorithm 
described in section 4 . 5 of the book is used. This 
algorithm estimates the equation by an iterative process 
in which the gradient of the function is written in two 
parts and one part of the gradient is held constant for 

35 each iteration of the algorithm. 



The user histories in the database could include only 
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information relating to the choices made by users for 
certain items (i.e. their preferences). The filtering 
method of the invention assumes that the user's choices 
are a stochastic function of the user and item profiles. 
5 in observing a user's choices, beliefs about the user's 
profile can be updated and in this way, more is learnt 
about the user's likely future choices. In many cases 
however, the method is not restricted to considering a 
user's past choices. It is also possible to learn about 
10 a user's likely future choices from other information 
about the user/, such as demographic information. 

Further, in the method described below, the user and 
item profiles are interpreted as causing user choices. 
15 Alternatively however, the user choices could be 

interpreted as being correlated random variables and so 
the profiles are treated as a way to facilitate a 
parsimonious representation of the correlation structure 
between them. It is because these random variables are 
2 0 correlated that knowing the realisation of one helps 

predict realisations of the others, and the predictive 
content of a user's choices is summarised by his or her 
posterior profile. Thus, in this interpretation, the 
profiles do not cause user choices but rather they track 
25 what previous choices indicate about possible future 
choices. Under this alternative interpretation, 
information about a user can be interpreted in the same 
way as observations about his or her choices. Thus, the 
correlation between random variables can be modelled 
using user profiles in the same way as with information 
about choices. 



30 



Thus, information about users can be introduced into the 
framework by using the following steps for each new kind 
3 5 of information: 



1. Create a new item with index k $ 
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2. Define the values that observations relating to the 
information, h k , can take. 

3. Define the likelihood of an observation as the 
stochastic relationship between a user's profile, 

5 a if the profile of the new item, b k , and the 

possible values of the observation: f(h k |ai,b k ). 

4. Estimate all the item profiles together, treating 
this new item in just the same way as observations 
about user's choices. 

10 

In the following example, the database of user histories 
records whether or not a user has visited various 
attractions (i.e. the observations about user choices 
are binary) . Graphical analysis of the contents of the 
15 database suggests that the average age of a user's 

children is informative about which attractions the user 
has visited. Thus, information about the average age of 
a user's children is added into the model of the 
dataset . 

20 

A simple way to introduce information about average 
child age is to create another item which records the 
information as an additional observation about a user. 
Instead of the observation relating to a choice the user 

25 has made, it relates to non-choice information about a 
particular subject. It is necessary to define the 
allowable values for this item. In this case average 
child age is treated as a binary variable which records 
whether or not the user has older children. This 

30 approach is particularly simple to describe and to 

interpret as it means that all the items are of the same 
type. Moreover graphical analysis suggests that this 
approximation may be reasonable given that the true 
relationship between average child age and visiting 

35 behaviour is not always monotonic. it will be clear, 
however, that a number of ways are possible. For 
example average child age could be approximated as a 
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continuous variable. The method is not restricted to 
cases where all variables have the same type. 

The cut-off between older and not-older children has 
5 been chosen to be 10 years old. This value is chosen as 
being reasonable in light of simple graphical analysis 
of the average child age for users visiting the various 
attractions. It will be clear, however, that 
alternative methods of arriving at the cut-off could 
10 have been used. For example various values could have 
been tried and the fit and performance of the model 
compared, or an automatic routine to choose that cut-off 
that maximises the likelihood of the data could have 
been created. 

15 

To introduce information about average child age the 
following steps were carried out: 



1. 



Create an item that records whether or not the user 
has children with an average age of 10 or above. 
The item index is denoted OLD 



20 




2. 



Assume that the relationship between a user's 
profile and whether or not they have children with 
an average age of 10 or above can be approximated 
as a logistic curve: 



25 



f(h OLD \a,b) = • 




3 . 



Treat this new item identically to the items that 
record whether or not the user has visited each of 
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the attractions. 

A numerical example of a data filtering method which 
includes an item representing average child age is given 
5 in Appendix G. 

The real-time Bayesian recommendation engine could take 
various forms depending on the context in which it is 
used. The engine described below will specify which of 
a number of items a user should visit next . The 
recommendation engine takes a user history and returns 
an item with the highest expected score, and the 
expected score for that item. 

The on-line Bayesian recommendation engine receives a 
set of item profiles B found from a previous iteration 
of the item profile engine. It also receives the 
history h for a user for whom a recommendation is 
required. The index i which matched the user i to 
history h is not used in the recommendation engine 
notation as only one user is dealt with at a time. 

In some instances the history h for a user for whom a 
recommendation is required is advantageously modified 
before being used in the on-line recommendation engine. 
This is the case when the user history records, amongst 
other things, which actions the user has already taken 
and when the recommendations are based on predicting 
which action will be taken next. In this situation, it 
is preferable to modify the user history so that it 
records only information that is known currently and 
that will remain true whatever action the user takes 
next. 

Thus, in the embodiment of the profile engine described 
above, the user history records whether or not a user 
has taken a plurality of actions, such as for example 



15 



20 



25 



30 
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whether or not they have watched a movie. Some 
observations about the user will not change, whatever 
action the user takes next. For example, if a user has 
already watched "Titanic" then she will still have 
5 watched it whatever she does next. However, other 

observations may change. Thus, for example, a user may 
not have watched "Toy Story" but if his next action is 
to go and watch it then the observation relating to "Toy 
Story" will change. It is undesirable for the user 
10 history to record information that might change 

depending on the user's next action and so, the modified 
user history should not record any information about 
whether or not the user has watched "Toy Story" in order 
to overcome the problem. 

15 

Thus in general, the prior distribution over possible 
user profiles is updated in the recommendation engine 
using only information relating to those items for which 
a positive observation has been recorded. This is 
2 0 implemented using a modified user history 6 which 
. follows: 



& = 



1 if h 1 = 1 

tj = 1 1 J 

. /f ft ' = 0 



Empirical tests have shown that the use of a modified 
user history 0 in the recommendation engine generates 
better predictions. 

The recommendation engine uses a prior distribution over 
possible user profiles to generate an updated or 
posterior distribution by Bayesian inference. Ideally, 
the possible user profiles and the prior distribution 
are the same as those used by the off-line profile 
engine. in practice however, the two distributions may 
differ in detail without affecting performance. 
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Nevertheless there is no distinction between them in the 
notation used here. 

Thus, as for the off-line profile engine, the prior 
distribution over possible user profiles is denoted by 
a (a) and a q (a q ) is the marginal distribution with respect 
to characteristic q. 



Tests on the performance of the recommendation engine 
have indicated that it is sufficient for practical 
purposes that the prior distributions used are (possibly 
different) discrete approximations to the standard 
normal, and that there are sufficient points in the 
domain of the prior distribution used by the 
recommendation engine. (Five or more points per 
characteristic will normally be sufficient). Thus, in 
the preferred embodiment of the recommendation engine a 
binomial approximation to the standard normal is used. 
Here, the binomial distribution with a sample size of 4 
is used and the number of successes is transformed so 
that they are distributed evenly about 0 giving: 



a q e {-2,-1,0,1,2} 

„,(„,, . ± — mi — 

2 4 (a q + 2)! (2-a g )! 

<*(a) = II a (a ) 
9=1 



The recommendation engine uses Bayesian inference to 
find the posterior distribution over possible user 
25 profiles, a(a|h). Standard Bayesian inference leads to 



a(a)/j) = «(*)L(h\a.B) 
22 a ^)L(h\a,B) 

aeA 
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where L(h|a, B) is the function defining the likelihood 
of a user history as defined above in the discussion of 
the off-line item profile engine. 

After deriving a posterior distribution over user 
profiles, the recommendation engine uses this to 
calculate an expected score by the user for each item. 
This expected score indicates the expected preference 
for an item by the user. The underlying assumption of 
this method of profile sequencing is that a user's past 
choices depend, on their preferences. This dependence is 
given by the likelihood function for an observation, and 
so the expression for the score is based on this 
function. 

In the preferred embodiment of the recommendation engine 
when analysing attendance data, the score for an item is 
taken to be the probability that the user has visited 
it, given their profile. 

Thus p(j|a,B) = f(h j = l|a, B) , where p(j|a f B) is the 
rating for item j by a person with profile a. 

Taking the expected ratings over possible user profiles 
25 then gives: 

ptflfl) = £ a(*\h)PV\a t B) 

Thus in use, the recommendation engine outputs a set of 
preferences of a user for various items. The output is 
in pairs of numbers, the first number identifying the 
recommended item and the second number giving a score 
30 that indicates how strongly the user is expected to 
prefer it. 



10 



15 



In the following, J* denotes the set of items in the 
data set for which the observation for the user in 
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question is 0. 



.0 



.5 



The engine finds the item for which the user's expected 
rating is highest out of the set of items CP . The item 
with the highest expected rating out of set J« is 
denoted by r x and r 2 is the expected score for item r^ 

Thus, the system recommends an item to the user which 
satisfies the following function: 

r x = arg max jeJ . p(j|B) 

where 

J' = {j|h j } = 0 

and 

r* 2 = P(r x |B) . 

A numerical example of the off-line profile engine and 
on-line recommendation engine as described above when 
functioning is given in Appendix H. 



In an alternative embodiment of the off-line item 
profile engine to that described above, an alternative 
model is used to estimate the item profiles. 

The alternative model supposes that underlying each 
binary observation is a continuous variable, where the 
observation is positive if the continuous variable is 
above a threshold. Next suppose that the underlying 
continuous variables are generated by a standard normal 
factor model. A common approach to estimating the item 
profiles in standard normal factor models uses the 
correlations between the continuous variables. These 
cannot be calculated directly, since the continuous 
variables are not observed. The correlations can be 
estimated, however, using the tetrachoric correlations 
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of the observations. 

The reason that this alternative approach is useful is 
that there is an equivalence between the logit model 
5 described above and the underlying variable model, in 

the sense that they cannot be distinguished empirically. 
The parameter estimates in the two models are related by 
a simple formula. This means that estimates of the item 
profiles from one model can be used as the basis for 
10 item profiles in the other. The equivalence between the 
two models is described in detail in chapter 4 of 
Bartholomew and Knott (99), "Latent Variable Models and 
Factor Analysis" , second edition, publ . Arnold, London. 

15 The method for estimating item profiles by first solving 
the alternative model is not as efficient as the full 
information maximum likelihood estimation method 
described previously. It does, however, have the 
advantage that the techniques for solving linear factor 

2 0 models using correlation matrices are widely available 
in statistical packages. 

The method involves the following steps: 

25 x - Calculate the tetrachoric correlation matrix for 
the observations. This can be done using LISREL. 

2. Estimate the standardised factor loadings for a 

standard linear factor model using known techniques 

30 based on correlation matrices, treating the 

tetrachoric correlations as though they were 
product -moment correlations. (Standardised factor 
loadings are those that obtain when the underlying 
variables are first normalised so that each has 

35 unit variance.) This can be done using LISREL . 



3. 



The factor loadings from step 2 are the item 
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profiles A 3 , j = 1, ...J for the linear factor 
model. Each profile contains a weight for each 
component, A 3 , q = 1, Q. Derive the item 

profiles for the binary observation model, b 3 , j = 
1/ • • • / J/ from those for the linear factor model 
using the following: 

/ TT *i 

b e, = — - : .9=1 Q. ^ = 1 J (1) 

\/3 



1 - E (K) 1 



q=1 



where n j = the proportion of observations of item j 
equal to 1. 

4. There is an exception to the equation (1) above. 
In some cases the item profiles from the linear 
factor model are such that 



15 



in which case the equation in (1) does not give 
sensible results. These cases are known as Heyward 
cases. In these cases (in practice whenever 



Q 

E 

9=1 



^0.99) 



the relevant part of (1) is replaced with (2) 
below. 



K 



1/3 



N 



:. <7 = 1 Q, J = 1 J 



(2) 



2 - E (K) 2 



(7=1 
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This follows the suggestion of Bartholomew and 
Knott in section 3.18 of their book. 

Appendix I gives a numerical example of the use of this 
5 alternative method of the invention. 

A practical implementation of the filtering methods of 
the invention for the analysis of data is shown in 
Figures 3 to 6 . A raw set of data showing which of a 

10 range of attractions has been visited by each user as 

well as the user's age, how many children they have and 
the age of their children is shown in Figure 3. This 
data can be entered into a computer program which is 
adapted to analyse the data using a filtering method 

15 according to the invention to find item profiles for 
each of the attractions and then to generate 
recommendations . 



20 



In the past, if a marketing executive wished to analyse 
a set of data such as that of figure 3, he would have 
carried out a pair-wise correlation and picked out items 
with a high correlation as being similar to one another. 
A pair-wise correlation. for the data of figure 3 is 
shown in figure 4. For example, he would have 
25 considered Chessington and Thorpe Park having a 

correlation of 0.51 (the highest in the data shown) as 
being very similar to one another. it will be 
appreciated however that this method is relatively 
complex and time consuming and that only two items can 
30 be compared at any one time. 

With the filtering method of the invention, a first 
component of the item profiles for each item can be 
plotted as the X axis against a second component of the 
35 item profiles for each item on the y axis. Such a plot 
as produced by software implementing the method of the 
invention is shown in Figure 5. Of course it will be 
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understood that information about users which can be 
treated as one or more items can be included in these 
plots. If the user disagrees with the place on the plot 
for a particular item then he can forcibly move it along 
5 in the x and/or y directions. For example, if a major 
refurbishment of an attraction had been carried out, it 
could be moved on the plot to take account of this. 

As shown in Figure 5, the % popularity of each item is 
10 shown by the size of dots representing respective items. 
Using the plot of Figure 5, marketing executives can 
compare all items profile components if they wish. The 
software used can also plot each user in the database 
against the item profile components (not shown) . 

15 

In addition, an item not included in the database could 
be added to the graphical representation and then used 
in generating recommendations. To do this an operator 
would specify an item profile for that item. 

20 

Further, the graphical representations generated by the 
software can be very useful to a marketing executive's 
understanding of data in a dataset. For example, it 
could allow them to determine that one item profile 
25 component related to a characteristic of users such as 
for example, old fogyness. 

As shown in figure 6, the item profiles calculated from 
the raw data can be used to predict which attractions a 

30 user will like by the filtering method of the invention. 
The software uses this information to plot a campaign 
map as shown in figure 6 which shows where groups of 
users having similar profiles are situated relative to 
first and second brand values or item profiles plotted 

35 on the x and y axes respectively. When planning an 

advertising campaign for example, the campaign map of 
figure 6 could be used to determine which groups of 
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users should be targeted. As shown, the size of dots 
plotted on the campaign map could show the number of 
users falling into each group or cluster. 

The filtering method of the invention provides a 
predictive technique that builds, estimates and uses a 
predictive model of the observations relating to a case 
in terms of a profile for that case that includes hidden 
metrical variables. The method can be used for: 
predicting which of a number of items is most likely to 
arise next; or, predicting the values of a number of 
missing observations. 



The method can be applied to tasks that fall within the 
heading of analytics, marketing automation and 
personalisation . 

The method can be used as a method of filtering data to 
predict the suitability of an object, or the relative 
suitability of an object, compared to other objects, for 
a customer. 

Predictions about the suitability of an object for a 
customer (or prospect) can be used for personalisation 
and, in particular, as the basis of making 
recommendations to her or concerning her likely 
preferences or interests. 



Recommendations can be part of an explicit process in 
which the customer elects to enter into a process of 
providing information in order to receive 
recommendations. 



Alternatively recommendations can be part of an implicit 
process in which information about the customer's 
activities are used to generate the recommendations and 
suggestions are made unprompted. An example would! be 
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cross-sell suggestions made by a call centre operative. 
Or personalising web pages, or e-mail or direct mail 
suggestions . 

One application is where an administrator wants to 
suggest content or products to a customer based in part 
on what content or products she has already rated or 
sampled. In this case the items will be the set of 
possible things that may be rated or sampled. The 
method would be based on the concept of suggesting that 
thing which is likely to be most suitable. 

To make recommendations the following steps are 
implemented. 
15 

Generate a predictive model of the suitability of items 

1. Specify the data 

Identify the items that recommendations might be about. 
Examples of items that might be recommended are: 

• products and services 

• content (eg web pages) 

• holiday destinations, movies, books, etc 

• courses of action 

Identify a data set of observations that can be used to 
predict the suitability of the items. Data can be 
gathered from a number of sources including: 

• from a website 

• by questionnaire or survey 

• by phone 

• from bank records, store card records or other 
35 sources of transaction history 

• customer service records 

• loyalty card records 



5 
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• obtained from third party sources 

The data must include direct information about the 
suitability of various items for customers. Examples of 
5 the observations about the suitability of items are: 

Visits to web pages. Assume that customers only visit 
web-pages that are suitable. One possible 
implementation is that different sessions are considered 
10 as being different records. Another is that all 

sessions for a, user are aggregated into the same record; 

Explicit ratings of the suitability of items by 
customers. This is used for example on the MovieCritic 
15 website; 

Customer purchase history. Assume that customers only 
buy items that are suitable; or 



What items have customers selected in the past (e.g. 
what movies have they seen, where have they been on 
holiday) . Assume that customers only select items that 
are suitable. 

The data may also include covariates, i.e. observations 
that might be informative about a customer's 
preferences, but which are not directly about the 
suitability of items. Examples of observations which 
are covariates are: 

answers to questions, either just from this visit 
to the website, or combined for all visits; 

responses to "exogenous standards". Examples of 
these are a photograph of scenery for holiday preference 
selection or descriptions of TV programmes for book 
preference selection. The exogenous standards used can 
be in multi-media and include any form of graphic image, 
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photograph, sound or music as well as a conventional 
passage of text, a name or other written description; 

customer contact data logged by sales and/or 
customer service staff in respect of customer 
5 interactions (e.g. telesales, emails, face to face) . 

Including both objective data (e.g. call duration and 
time) and subjective assessments (e.g. categorising call 
purpose, customer satisfaction etc.); and 

demographic, geographic, behavioural and other 
10 information about the customer. 

2 . Model the data 

3 . Estimate the parameters of the item models 

15 

Make recommendations to customers 

Depending on the context: this may be a batch if the 
context is a mail shot or similar; alternatively it may 
2 0 be one customer if the context is a web- site or call 
centre etc. 

For each the following steps are carried out. 

25 1. learn about the customer from observations about 
her 

Observations about the customer may include observations 
about the suitability of some items and about 
30 covariates. Use these observations, together with the 
item models estimated at the previous step, to learn 
about the customer's profile. 

2 . make predictions about the suitability of items 

35 

Use knowledge of the customer's profile, together with 
the item models, to predict the suitability of items for 
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that customer. Predictions can be made in respect of: 

all items which have not be previously selected by 
the customer; those unselected items which are not 
excluded by business rules. 

5 

3 . make a recommendation 

Recommendations are made based on the predicted 

suitability of items. Examples include: 

recommend the item most likely to be suitable; or 
10 adjust the suitabilities in the light of business rules. 

Contexts in which recommendations can be made to 

customers include any touchpoint between the customer 

and supplier, including: 

online, as part of an e- commerce site or an 
15 Internet site holding information; by sales operatives 

in call centres/contact centres; by sales staff in shops 

and other face to face arenas; by e-mail and post; 

digital interactive TV; and personalised newsletters, 

mailshot or brochures. 

20 

The personalisation will be related to particular items 
in the document and may be implemented using a print 
technology that can create customised documents. A 
specific implementation is in the management of 
25 selective binding programs. 



The recommendations could be notified to the end- 
customer (possibly via a third party such as the 
provider site operator or a call centre staff member) . 

Alternatively some or all of the output may be made 
available solely to one or more third parties (such as a 
provider) and not to the end-customer. This might be 
useful for commercial purposes such as for example 
content management or advertising personalisation. 



The observations about a customer from different 
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channels can be aggregated into a single set. To do 
this the client implementing the Profile Sequencing 
system will need to ensure that identification 
procedures recognise the customer no matter what channel 
5 she uses . 

The method of the invention enables some additional 
features to supplement the basic personalisation task. 
These have additional benefits. 

10 

Generating and viewing item profiles 

The filtering method generates a profile for each item. 
Item profiles may automatically be updated periodically 
15 by recalculation to incorporate any new data that has 

been acquired since the last calculation. Recalculation 
can be done arbitrarily frequently, including in real 
time, as new data is acquired. 

20 In many cases the item profiles can be used to generate 
knowledge of the relationship between the items,' or of 
the items themselves. It will frequently be the case 
that the components of the profile are interpretable by 
marketing executives in terms of meaningful variables. 

25 

One implementation could be as a software component that 
allowed the system administrator to view a graphical 
^representation of the item profile map showing the item 
profiles as points in a profile space, with one axis for 

30 each component. Where preference data is gathered, this 
profile space can be considered as effectively 
equivalent to a machine generated product position map 
or, as the case may be, brand position map, otherwise 
known as a perceptual map. (However, it will be noted 

35 that the map will have been generated using the 

objective and quantified analysis of observed consumer 
preferences, rather than through the use of subjective 
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consumer surveying) . The interface could allow the 
administrator to use their skill and judgement to 
interpret the components, and to attach their own 
labels, identifying the brand or product values (which 
i may correspond to product or brand attributes) to the 
components, which can then be used to refer to the 
relevant components . 



Additional features include: data points on a plot of 
item profiles could indicate the item popularity, for 
example using size or colour; filters could be used to 
show graphically how popularity differs, for example 
between those customers who have young children and 
those who do not, between those customers who have seen 
"Titanic" and those that have not; and profiles using 
different sets of historical data could be shown on the 
same plot to indicate changes over time in positioning 
of items . 



These profiles may also be used to sort items into 
groups or clusters by comparing the item profiles and 
placing all those items having similar profiles into one 
group or cluster. 

Analysing the item profiles in any of these ways may be 
useful because: 



by illuminating the basis on which recommendations 
will be made the analysis may generate understanding and 
trust that the recommendations will be sensible, and so 
encourage use of the system; the analysis of the item 
profiles can be used as the basis for modifying the 
behaviour of the system; and knowledge of the 
relationship between items may itself form the basis of 
other marketing initiatives that do not depend on 
personalising marketing messages to customers. 
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Generating customer profiles 

Profile Sequencing provides a method for ascribing a 
profile to a customer, based on her behaviour. Customer 
profiles may automatically be updated periodically by 
5 recalculation to incorporate any new data that has been 
acquired since the last calculation. Recalculation can 
be done arbitrarily frequently, including in real time, 
as new data is acquired. This allows recommendations to 
be updated, using the updated profiles (together with 

10 updated item profiles if relevant) , arbitrarily often, 
including in real time if desired. One convenient way 
of displaying customer profiles is by a graphical 
representation of the customer profile map in which the 
customer profiles relating to any given set of items are 

15 plotted as points in a profile space with one axis for 
each component (the components corresponding to those 
determined for the relevant set of items) Where there 
are a large number of customer profiles to be mapped, 
these may alternatively be depicted by some of density 

2 0 mapping (e.g. contour chart, colour coded profile 

density map or simulated 3D representation (with the 
third dimension representing the density value)). Where 
customer profiles are mapped against item attributes, 
relevant items (and, if appropriate other objects eg. 

25 messages, demographic categories etc.) may be 

superimposed on the plot as a convenient means of 
understanding the inter- relationship between the items 
and customer preferences. These profiles may be used to 
sort customers into groups or clusters by comparing the 

30 customer profiles and placing all those customers having 
similar profiles into one group or cluster. These 
groups can be used as the basis for targeting marketing 
campaigns . 

35 Customer profiles may be calculated at large across the 
whole population about which there is relevant data. 
Alternatively, the profiles might be restricted to some 
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subset by first filtering by one or more criteria (e.g. 
demographic, geographic or behaviouristic criteria) . 
These filtered profiles may then be displayed in exactly 
the same as described above for the population as a 
5 whole. 

Combining filtering with rules 

In some cases the administrator may want to restrict the 
10 set of objects that might be recommended to a customer, 
or might want to otherwise modify the pattern of 
recommendations or other forms of personalisation (e.g. 
messaging, content) . The following are illustrative 
examples of such situations. 

15 

Restrictions may be based on rules operating on some of 
the observations about that customer. For example "do 
not recommend products that do not satisfy objective 
requirements specified by the customer". 

20 

Restrictions may be based on commercial considerations 
such as "do not recommend products that are out of 
stock". 

25 Modifications to the pattern of recommendations may be 
based on commercial considerations under which objects 
that carry a higher commercial benefit, or which form 
part of a special promotion, are more likely to be 
recommended. 

30 

To accommodate these situations the Recommendation 
Engine can include additional steps that may include the 
following. 

35 A list of restricted objects is passed to the 

Recommendation Engine and the predicted suitability is 
calculated only for objects that are not restricted. 
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A list of weights is passed to the Recommendation Engine 
that is used to weight the calculated predicted 
suitabilities of the objects, and the object with the 
highest weighted suitability is recommended. 

5 

If object profiles include a term that reflects the 
general popularity of the object, then the 
Recommendation Engine can accommodate these situations 
by using modified object profiles in which the 
10 components representing popularity for the different 
objects are adjusted until the pattern of 
recommendations is as desired. 

Communicate with only a subset of customers 

15 

In some cases the administrator may wish to use profile 
sequencing to target a number of prospects from a longer 
list for direct marketing purposes (e.g. mailshot, 
personalised email or outbound telesales) . This can be 
20 accommodated by assessing the probability of interest 
using profile sequencing for each prospect in turn and 
then : 

If all those above a certain threshold of interest are 
25 to be targeted, rejecting all prospects that fall below 
the assigned probability of interest whilst passing 
forwards the remainder for further processing (if 
further criteria for targeting are to be applied) or for 
despatch of the marketing material to them; or 

30 

If only a pre-set number of prospects are to be 
targeted, ranking all prospects in order of probability 
of interest and then discarding all those that fall 
below the pre-set number ranking. 

35 

Similarly, the administrator may wish to make a certain 
promotion or display particular content on a website 
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(including mobile enabled website) or interactive TV 
channel only if the level of interest predicted for the 
recipient is over a certain threshold. In this case 
also profile sequencing can be used in real time for 
each user/viewer to assess if the assigned probability 
of interest is reached, rejecting all viewers /users with 
lower probability forecast interest. 



Another manifestation of the use of rules to modify 
10 profile sequencing output is to pre- filter the sample 
set by administrator specified demographic, geographic 
or behaviouristic criteria so that recommendations are 
only generated for prospects that are pre-qualif ied by 
one or more of the criteria. This pre-qualif ication 
15 would be particularly useful in managing personalised 
advertising or direct marketing campaigns. 

A further form of restriction that the administer may 
wish to apply to modify profile sequencing output is, 

20 prior to using profile sequencing, to rank or group 
customers (or prospects) according to their economic 
attractiveness as customers and to restrict or modify 
marketing effort to each customer according to their 
economic ranking or grouping. Economic ranking or 

25 grouping can be carried out using customer scoring or 

any other appropriate standard technique. After ranking 
or grouping, personalised marketing using profile 
sequencing can, for example, be restricted to the nth 
most profitable customers or to customers exceeding some 

30 arbitrary profitability. Alternatively, extra 

inducements (eg. special promotions) may be restricted 
to more profitable customers using profile sequencing to 
determine for example which, out of those customers, the 
promotions should be aimed at or which promotion should 

35 be targeted at which customer. 
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Changing item profiles 

One way for system administrators to affect the pattern 
of recommendations is to override some or all of the 
machine -generated item profiles. This may be useful if, 
for example : 

the administrator feels that the machine-generated item 
profiles are misleading; one of the items has been 
rebranded so that its profile is not well modelled using 
past data; the system administrator may want to modify 
the proportion of recommendations to the different 
items, to reflect commercial considerations; or the 
actual recommendation made by the system will depend on 
the pattern of profiles. The system administrator may 
want to affect the pattern of "competition" between 
items so as to favour some items at the expense of 
others . 

This control can be effected by allowing the 
administrator to override the components of an item 
profile. One implementation could be via a graphical 
interface. A convenient implementation is one that 
allows the administrator to "drag and drop" the item 
from one place in profile space to another. In this 
implementation, the item profile corresponding to the 
selected position on the graphical interface would be 
automatically calculated and that profile substituted 
for the original one. Depending on whether the 
administrator wanted to make a permanent change or alter 
the profile for one particular purpose only (e.g. model 
a scenario or run a particular campaign) , the changed 
profile could be treated as either a local value only or 
as a global change. 



10 
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Adding new items 

When adding new items the administrator may impose an 
initial item profile, or may rely on a default initial 
5 profile (for example that each component in the item 
profile has a neutral value such that the predicted 
suitability for a customer is the same regardless of the 
customer's particular profile). Over time the system 
will collect observations about the new item. 
10 Components in the initial profile may be replaced by 
free parameters, when there is sufficient data, that 
give a better fit to the data. Statistical methods of 
model selection can be used to determine when there is 
sufficient data. 

15 

The interface for end- customers 

Features of the customer interface at which the customer 
enters observations, such as a website, may include the 
20 following: 

the interface is arranged such that the customer may 
choose which items to rate or otherwise provide 
information on (eg. by responding to multiple choice 
25 questions) and in what order to rate or provide 
information on them; 

updated recommendations are presented to the customer 
each time she provides a further observation. This will 
3 0 further encourage the customer to input information as 
they will obtain a direct result by so doing; 

each time the customer provides a further observation 
she is presented with one or both of: 
35 o updated recommendations; 

o an indication of the level of personalisation 
of the recommendations. The indication of the 
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level of personalisation could for example be 
provided by graphical means, for example a 
sliding scale, representing a personalisation 
score. One way to derive a personalisation 
score would be by determining the average 
variance of the probability distribution over 
each component of the profile for the customer 
in question. 



10 This feedback will encourage the customer to enter more 
observations; and if the interface is a website then the 
inputting of information is carried out on the same page 
on which the personalisation level indicator and the 
recommendations are displayed. 

15 

The filtering method of the invention can, without 
limitation, be conveniently used to automate the 
planning and execution of marketing campaigns. 
Predictions about the suitability of an item can be used 
2 0 to identify to which customers a particular 

recommendation should be made. This may, for example, 
be used when promoting a particular item. 

Predictions can also be used to identify the customers 
25 for which one of the available suggestions are most 
suitable . This may be used when choosing to which 
customers recommendations should be made. 

The administrator may want to communicate messages (ie. 

30 information in whatever format relating to items to be 
marketed that is designed to inform, interest, excite 
and/or stimulate or support a desire to acquire in the 
recipient. Examples include advertisements, editorial 
material, newsletter content, images, sounds, music, 

35 video content, presentations etc. It also includes 

information or recommendations regarding new products / 
services) not currently included as items in the 
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database, and may either want to select who out of a set 
of customers to communicate a given message to, or may 
want to communicate different messages to different 
customers within a given set. Examples tasks where this 
would be useful include: 



promoting an item using a range of marketing messages or 
images designed to appeal to different kinds of customer 
for example through a direct marketing campaign; 

10 

promoting an object or objects not in the database 

personalising web-site, PDA, brochure, newsletter, 
mailing etc. content (ie. content management); and 

15 

personalising the selection and/or content of relevant 
advertising (through whatever media capable of 
supporting personalisation) . 

20 Messages may be communicated over any touchpoint between 
the customer and the supplier. 



Existing methods for communicating messages not in the 
database are limited. The administrator can: 

25 

use a machine learning based clustering routine to 
identify clusters of customers, look at the pattern of 
their behaviour in order to assess their "brand values", 
and then choose the appropriate message to send to each 
3 0 cluster. In many cases, however, there are few or no 
meaningful clusters in the data; 

specify rules to determine which message to send to each 
customer. This can be hard when the range of possible 
3 5 customer histories is large, as there may be no 

intuitive way to distinguish groups on the basis just of 
rules applied to their histories; or 
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manually identify market segments, devise rules to 
assign customers to segments, and choose an appropriate 
message for each segment. This has the same problems as 
above, when the range of possible customer histories is 
large there may be no intuitive way to distinguish 
market segments. 

Profile Sequencing enables an alternative approach. 
Profile Sequencing could be implemented in a software 
package that allowed the following process: 

Another application is where an administrator wants to 
identify suitable customers to target with a particular 
message (or which customers should be targeted with what 
message) and where the message is not currently 
something on which the administrator has data. A method 
would be : 



• Identify a set of covariates on which there is 
20 data. 

• Treat at least some as items. 

• Use a filtering method of the invention to work out 
item profiles for these using the data. 

• Estimate a case profile using observations of the 
25 covariates using a method of the invention. 

• Predict suitability for each of the messages using 
a method of the invention. 

• Implement some rule, for example "send the message 
most likely to be preferred" or "send the message 

30 if the likely preference is >0.5". 



In more detail, preferably the last three steps listed 
above comprise: 

• Specify models of the items. Suitable functions 
would be monotonically increasing functions of a linear 
function of the case profile, where the coefficients on 
the case profile components are the item profile 
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components, and where the fixed term is also an item 
profile component. Examples of these are described on 
page [] 

• Estimate the item profiles useing the filtering 
5 method of the invention 

• Create a binary variable, one for each message, and 
set up item models for them using the same function 
family as for the other items. 

• Allow the administrator to specify the item 

10 profiles for the messages possibly after analysing the 
item profiles for the other items, possibly using a 
graphical interface. 

• To determine whether and how to target a case: 
learn about (estimate whether point of density) the case 

15 profile from observations of the covariates treated as 

items; predict the suitability of each message using the 
method of the invention and the item profiles specified 
above; implement some rule, for example "send the 
message most likely to be preferred" or "send the 

20 message if the likely preference is >0.5". 

An example of this process is : 

Send out messages to customers in the database using the 
Profile Sequencing recommendation engine to identify 
25 which message is most likely to appeal to each customer, 
given the customer's profile, which is learnt from their 
observations, and the item profile of the message, which 
has been specified by the system administrator. 

3 0 Another application for Profile Sequencing is in media 
buying and selling and in the development of media 
plans. Personalisation applications rely on a database 
of customer records, where each record lists 
observations about the customer. In a media buying and 

3 5 selling application the database would be of advertising 
campaign records, where each record lists the media on 
which the advertising campaign (or individual 
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advertisements) was carried, together optionally with 
further information such as, for example, the individual 
advertisement used, the date, time, position, length and 
prominence . etc . ) Possible media would include but not 
5 be limited to: different newspapers and magazines ; 
advertising slots on different television and radio 
programmes; cinema/video; internet sites; WAP and other 
mobile channels; billboards; sports stadia; point of 
sale; bus/taxi; and commercial sponsorship. 

10 

The application uses the database to generate item 
profiles for the different media. It could then: 

generate knowledge about the product /brand values (which 
15 may be regarded as attributes) of different media. The 
interface could plot the item profiles as points in a 
profile space, with one axis for each component. This 
profile space can be considered as a machine generated 
media position map. The interface could allow the 
20 administrator to use their skill and judgement to 
interpret the components, and to attach their own 
labels, identifying the value or attribute, to the 
components, which can then be used to refer to the 
relevant components. Such maps might, as convenient, be 
25 each confined to one media class (eg. TV programmes, 

newspapers etc.) or incorporate multiple types of media 
in a single map; and/or 

suggest combinations of media (or, as the case may be, 
30 individual publications, programmes, types of event 

etc.) to use for new advertising campaigns, optimising 
the media mix. The user would specify the item profile 
of the campaign (or separately each element of the 
campaign) , possibly by "dragging and dropping" the 
35 campaign (or campaign element) onto the position map(s). 
The application would then list those media (or 
individual publication etc.) most likely to have 
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carried a campaign (or campaign element) with that 
profile . 

This functionality could be used , for example, by 
sellers of advertising space, media buyers, advertising 
agencies, marketing departments and consultancies and 
business analysts. 

It could also track and display changes in the media 
profiles over time (as described for item profiles more 
generally below. This could be useful to determine and 
forecast trends in the positioning of individual media 
publications etc., and in the media more generally. 

A further application of the filtering method of the 
invention is as a tool to facilitate product or brand 
management. The database in this case could be the same 
one as is used in a marketing automation function. 
Alternatively it could be collected separately. Unlike 
for marketing automation applications, there is no need 
to be able to identify customers since there will not be 
any future communication with them. This can simplify 
the data acquisition process. 

But it is an advantage of the method that exactly the 
same model is used for brand management as for 
personalisation and targeting, so that a single view of 
brands and so on can be used across many disparate 
tasks . 

The data will contain customer records. Records may 
contain information about a number of things including: 

what products they have bought; preference information 
about products; answers to questions; demographic 
information; geographic information; and behavioural 
information (including what products are bought). 
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A product or brand management application could: 

derive item profiles for the data. These will include 
in particular item profiles for the different products 
5 and/or brands; 



the interface could plot the item profiles as points in 
a profile space, with one axis for each component. This 
profile space can be considered as a machine generated 
position map. The interface could allow the 
administrator 'to use their skill and judgement to 
interpret the components, and to attach their own 
labels, identifying the values (which may be regarded as 
attributes), to the components. These labelscan then be 
conveniently used to refer to the relevant components. 
This can generate marketing relevant information such as 
identifying if products have values or attributes in 
common; 



20 the interface could allow the administrator to run "what 
if" scenarios, for example to examine what the effects 
on sales is likely to be if one product is rebranded, 
where the rebranding is specified in terms of a changed 
item profile, one or other market expansion strategy 

25 were to be followed, it is proposed to establish or 
reposition a brand, in which case the optimum 
positioning can be explored, there is a demographic 
shift, or a new product or brand enters the market with 
particular attributes, where the product/brand 

30 attributes are quantified (either using market research 
or by some other means eg. the administrator's own skill 
and judgement) and entered as an item profile. This 
could form the basis of a tool to identify "gaps or 
market opportunities that could be exploited by new 

35 products /brands . 

Other useful product/brand management applications 
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include the follow tasks: 

forecasting the parasitic effects on other products of 
advertising or otherwise promoting one of a number of 
5 products (whether these be competitors' products or the 
producers 1 own) ; 

psychographic (or behaviouristic or demographic or a 
combination of these) segmentation on the basis of the 
10 customer profile position map; 

predicting cannibalisation effects on the introduction 
of new product (s) according to product positioning; 

15 forecasting effects of planned product obsolescence or 

product elimination (including as part of a product line 
pruning or retrenchment exercise) on sales of related 
existing and new products; 

2 0 promotional impact on product sales of advertising 
. campaigns according to positioning of advertising 
message (s) ; 

planning product/brand development strategies on the 
25 basis of product/brand positioning information; 

developing product differentiation strategies using 
information on relative product positions in position 
map; 

30 

forecasting demand in respect of introduction of new 
products (including product extensions and product line 
stretching) and optimising new product positioning; 

35 optimising new brand development (using information 

regarding brand attributes of existing competitor brands 
and customer profile positioning in that space to select 
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appropriate attribute mix for proposed new brand) ; 



optimising the positioning of flanking products or 
brands ; 

modelling the effects of proposed repositioning of 
products (or, as the case may be, product lines or 
brands) , for example due to product or brand 
modernisation or product modifications; 

assessing product mix consistency through observation of 
the relative positions of products on the position map 
and, if appropriate, modelling the effects of potential 
changes (eg. repositioning of existing products, 
elimination of products or introduction of new products) 
to optimise forecast demand) . Where the product mix 
shares a common branding this modelling will also form 
an important part of brand management and development ; 

2 0 planning product modification through forecasting the 

predicted effects on demand through the associated 
expected repositioning of the product; 

planning brand repositioning/revitalisation/ revival 
25 through reassessing the predicted effects on demand from 
the from the proposed new position (s) on the brand 
position map; 

assessing the suitability of prospective brand 

3 0 extensions or brand leverage by comparing the brand ? s 

positioning with the positioning of the product to be 
brought within the brand (or, if a new product, the 
positioning of representatives of that product 
category) • 

35 

quantifying product/brand image and, through the use of 
trend analysis, carrying out attitude tracking over time 



10 
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on that product /brand, particularly for use for 
management control and predictive purposes; or 

as a tool for planning, controlling and assessing 
marketing tests or campaigns (eg. for assessing whether 
marketing objectives associated with product or brand 
positioning have been met) . 

Analytical tasks, such as those highlighted above in the 
context of product and brand management, can be run 
arbitrarily often (including in real time if desired) to 
reflect changes with time (or as additional information 
is gathered) in the subject matter being analysed. This 
can be done automatically by recalculating the profiles 
underlying the analysis arbitrarily often including any 
new information that has been gathered 

The filtering method of the invention can be used in 
support of automated product configurators: It can be 
20 used (possibly in conjunction with other fact-based 
expert systems) to predict which amongst numerous 
product configurations or variants would appeal most to 
a prospective customer. The most appealing product 
configuration can then be presented to the prospective 
25 user automatically at an early stage as a pre-conf igured 
product option customised to that customer's needs. 



10 



15 



The method of the invention can also be used as a method 
of analysing data to: predict whether an observation 
3 0 about one particular item is likely for a case; and 

possibly also to investigate whether there are different 
reason associated with the observation being likely; and 
possibly to also target cases for which the observation 
is likely, possibly depending on the different reasons. 

35 

One example is where companies want to manage customer 
attrition, or churn. Another is whether the customer is 
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likely to generate a lot of revenue for a supplier and 
so be a particularly valued customer. Although the 
description that follows is in the context of attrition 
management it will be understood that the description 
5 could equally apply to other examples. 

The aim of attrition management is to: 

• Identify which customers are likely to close an 
10 account. 

• Target customers according to any differences in 
the underlying reasons why they are likely to close 
an account . 

15 Data that might be useful in predicting behaviour can 
include but is not limited to: 

demographic information; purchase patterns; information 
from customer service records; and information provided 

2 0 explicitly by the customer. 

The method for predicting whether a customer is likely 
to churn involves the following steps. ' 

25 1. treat all the pieces of information, including the 
event that the customer churns, as items 

2. use the filtering method of the invention to work 
out item profiles for these using the data. 

3 . make predictions about whether or not a customer is 

3 0 likely to churn using the method of the invention. 

The difference is that instead of working out the 
likelihood that the customer will choose each of a 
range of unchosen objects, instead only the 
likelihood that the user will choose the item 
3 5 "churn" is worked out. 



One method for investigating the different reasons for 
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attrition is to: 

Specify a binary variable stating whether a 
customer closed an account as an item. 

• Identify a set of covariates which might be 
informative about a customer's attrition behaviour and 
treat at least some as items. 

• Specify models of the items. Suitable functions 
would be monotonically increasing functions of a linear 
function of the case profile, where the coefficients on 
the case profile components are the item profile 
components, and where the fixed term is also an item 
profile component. Examples of these are described on 
page [] 

15 • Estimate the item profiles using the filtering 
method of the invention 

• Identify those items which are signals of attrition 
- these will be those for which case profiles that give 
a high likelihood of the item being selected or having a 
high value will also have a high likelihood of 
attrition. 

• Investigate, possibly visually, whether these 
signals of attrition all have similar profiles, or 
whether their profiles differ indicating different 

25 reasons associated with attrition. 

• If desired, target messages to customers with a 
high propensity to attrite, possibly according to the 
different reasons associated with attrition, by 
specifying profiles for the messages that are similar to 
those of the signals of interest. 



20 



30 



One method is to: 



• Specify a binary variable stating whether a 
35 customer closed an account as an item. 

Identify a set of covariates which might be 
informative about a customer's attrition behaviour 
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and treat at least some as items. 
Do steps M through B. 

From the item profile for attrition, identify which 
components in a case profile are indicative of a 
5 high propensity to attrite. Where models depend on 

Q 

then these components will be those 

!0 >0 with a high b jq . 

• Analyse the other item profiles, possibly visually, 
and apply skill and judgement to decide what 
message is appropriate to customers likely to 
attrite depending on which components of their 

15 profile indicate propensity to attrite. For 

example if high component 2 is indicative of 
attrition, can we learn from looking at other items 
where component 2 scores highly what "reason" this 
component indicates . 

2 0 • Implement targeting of the customers by the method 
described above. 



The method can be used assess the likelihood of churn in 
the manner described above for each customer at 

25 arbitrary periodic intervals (including in real time) 
and, where, a churn likelihood over a given threshold 
probability is -detected, either alert the administrator 
to this or automatically select the marketing response 
predicted most likely to avert churn (treating the 

30 responses in the same way as messages as described 

above) and trigger suitable pre-emptive action. This 
process may be used in conjunction with rules to 
restrict which marketing responses will be considered by 
profile sequencing dependant on the economic value of 

35 the customer. 



It is assumed that there are considered to be different 
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reasons for churn that cannot be observed directly. 
Profile Sequencing can be used to distinguish these 
reasons. This can be useful because the marketing 
response to a customer who is disgruntled and is 
5 considering moving to a competitor is very different to 
one who is liquidating assets to invest. 

Another method is to use a priori knowledge about the 
reasons for attrition. For example modify the previous 
10 method as follows; 

1. decide what the reasons for churning might be, 

2. decide which items are indicative of which reasons 

3. associate each reason with a component in the item 
profile 

15 4. require that the case profiles are estimated so 

that they have as many components as reasons, and 
that items have non-zero values for a component in 
their profile only where the item is indicative of 
the reason associated with that component. 

20 

The filtering method of the invention can be used to 
alert operators of potentially fraudulent transactions. 
The basic idea is to build a model that relates various 
indicators of the pattern of a customer's transactions 
25 to their profile. A customer's profile is learnt from 
their past transactions, and when a new transaction 
occurs the system looks to see whether it is unusual 
given the customer's profile, 

3 0 The advantages of using the filtering method for this 
task are that : 



a very large number of similar variables can be used as 
part of the same predictive model. Traditional 
3 5 predictive models include variables directly in the 

predictive equations. If there are very many of these 
then traditional models cannot identify the separate 
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effects of each, and will not be able to estimate the 
equation parameters. With the method of the invention 
on the other hand only the customer's profile and 
possibly some covariates enter into the item models . 
Because each equation has only a small number of 
arguments, there is no need to ignore any variables. 

The system can be used by, for example: financial 
services companies (eg. banks, credit card companies 
etc) ; or telecommunications companies. 

It can be used in a retail context to detect fraud by 
individuals, in a commercial context to detect fraud by 
companies, public authorities or other commercial 
entities, or by commercial entities (eg. banks, shops, 
other companies, public authorities etc.) to alert 
against employee fraudulent transactions made by the 
employee on the entities behalf. 

20 In using the method of the invention to detect 

potentially fraudulent transactions, the process 
requires data on transactions so that unusual ones can 
be spotted. 

25 In the context of detecting credit card theft a system 
might consider: strange withdrawals; strange payees; 
strange time of day. 

In the context of mobile phone theft a system might 
30 consider: frequency of phone use; unusual numbers of a 
phone . 

Using the knowledge of the customer's profile, it is 
predicted how likely the observed transaction would be. 

35 

If the probability is sufficiently low, then someone is 
alerted to take a closer look. 



10 



15 
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In one embodiment, a computer software product for 
carrying out the filtering method of the invention could 
be supplied to customers to be used with data that they 
themselves obtain . 

5 

An alternative is to use the method to supply analysis 
and marketing automation tasks as a service, possibly 
over an extranet. Clients may send their data to the 
service provider, and would receive from them analytics 
10 results or inputs for marketing automation. 

One example may be where the service provider receives 
from the client a set of observations about a customer, 
and returns predictions about the suitability of 
15 objects. Depending on the commercial arrangements the 
customer database used by the filtering engines could 
contain: observations about customers that are pooled 
from different clients, or only observations about 
customers that are supplied by the client in question. 

20 

If observations are pooled from different clients, then 
there is the possibility that predicted suitabilities 
for a customer can be based on observations about her 
gathered from all those client sites that pool their 
25 data. To implement this the clients would need to 

implement identification policies that allowed customers 
to be identified no matter what participating site they 
were on. 

30 In other cases observations can be pooled from different 
clients, and yet predicted suitabilities for a customer 
can be based only on observations made by the 
clientmaking the request. In this case customers would 
have different identities for each participating client, 

35 and will have one record in the customer database for 
each different identity. 
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Intermediate cases are possible, in which for example 
some clients provide their data to the pool and get 
predicted suitabilites that benefit from all the data in 
the pool, while others benefit from the pool but do not 
5 supply their own data into it, or in which arrangements 
differ for different classes of item. 

The above has been described principally in terms of a 
service by which an individual customer" interacts 
10 directly with a service in real-time (either passively 

or expressly or both) . However, the service may equally 
well be provided to customers indirectly via the medium 
of a third party such as, for example, a salesperson or 
call centre operative. 

15 

Knowledge and analysis about customer and item profiles 
that the filtering method of the invention can generate 
can be sold directly to companies interested in market 
research in the appropriate markets. 

20 

Where information in the customer database is dated, 
knowledge discovery could be focussed also on whether 
there are marketing relevant trends in customer 
behaviour. Services could reflect the types of 
25 analytics described in the rest of the document except 
that they are carried out on behalf of the client on a 
consultancy basis rather than by the client themselves. 

The following describes the commonality between the 
3 0 various methods described above. 

1 The set up 

We have a data set D about a set of cases. For each case 
3 5 i a 1, I the data contains a set yi of observations 

about items j=l, . .., J. We want to build a 
predictive model for these items. Two paradigm cases 



WO 02/10954 PCT/GB01/03383 

- 121 - 

arise which are dealt with in essentially the same way. 



1. Data is binary and there are no missing values. 
Examples include where observations about items record 

5 - whether a user has or has not visited a web page 

- whether the customer has or has not bought an item and 
where the prediction task is to predict how likely one 
of the items is to have been selected from amongst those 
items that have not in fact yet been selected. 

10 

2. Data contains missing observations examples include 
(see section on missings) and where the prediction task 
is to predict what an observation for an item would be 
if it was not missing. 

15 

Throughout -PU|e) denotes the probability of random- 
variable £ given the particular value at variable 9» 
•L(0) denot es the likelihood of observations given the 
particular value of 9 •L(O) =LnP (£|0) . 

20 

1.1 The central concepts 

Item model f(y|a i/ b 3 ,.), y(a A , b j# .) 

2 5 The item model links an observation about an item to a 

case profile a ± . There is one function per item and they 
are the keys to the method. Once specified they allow 
us to go back and forth between observations, case" 
profiles, and predictions about observations. One form 

3 0 of item model is in terms of a modelled observation and 

an error. 

Yij = ?(a i# b j# . ) + e i3 

3 5 where e tj is an error term equal to the difference 
between the modelled and the actual observation. 
Another form is in terms of a probability distribution 



WO 02/10954 PCT/GB01/03383 

- 122 - 

over possible observations f (y|a i# b jA ) =P ( yi ^ y |a i3 b 3i ) . 
These are closely related. If a probability 
distribution for the error term is specified then they 
are equivalent as 

5 

f(y|ai, bp.) = P( yij = y|a if b 5 ,.) 

- = y - y{a ± , b j# .)) 

To keep descriptions clear we will often use just the 
10 version in terms of probability functions. It will be 
obvious how to proceed in the alternative case. 
The functions are written to indicate that, in general, 
they may take arguments in addition to the item and case 
profiles. For convenience we may sometimes omit this 
15 additional dependence in the notation. 

Item profile b 3 

This specifies the parameters of the model for the item. 
20 It may include terms that identify which from a set of 
possible functional forms is being used. The set of all 
item profiles is B. 

Case profile a t 

25 

This specifies the case in terms that include metrical 
latent components. It does not include observations 
about other items. The set of all case profiles is A. 

30 1.2 The key steps 

The method involves a number of steps, each of which 
estimates some of the parameters in the item models. The 
estimation procedure may lead to point estimates of the 
35 parameters, or to density estimates that specify a 

probability distribution over some range of possible 
values. Estimated variables are shown with a hat in 
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what follows. 

D Step: Specify the data (Y, . ) which includes the 
observations Y about items. 

5 

M Step: Specify a model of the data M (Y, A, B,.) that 
includes as sub-models the item models f . The 
specification in eludes the range of allowable free 
parameters . 

10 

B Step: Estimate the item profiles. Take the 
observations and, using the model, derive estimates of 
the item profiles by trying to get a good fit to the 
data. Schematically we can write: 

15 

M(Y, .) - B 

■ A Step: Estimate a case profile. Take the models, 

estimated item profiles and observations for one case, 
20 and get the case profile. Schematically the step 
involves : 

Yi, B - Si 

Y Step: Make predictions about observations regarding 
25 items for a case. Take the model and estimates of the 
case profile and item profile to give predicted 
observations. Schematically: 

30 

We have described the A and Y steps as separate. In 
practice many related steps may be carried out together 
and it may be more efficient to code them together . 
Nevertheless conceptually the method can be expressed in 
35 these two different steps. 
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2. M Step 



The item model for item j has as parameters the itenT 
profile bj and takes as an argument a case profile. In 
5 all the embodiments we discuss it does not depend 
directly on observations about other items. In 
particular this means that: 



• Where the model is given as a probability 
10 distribution over observations then this 

distribution does not depend on observations about 
other items . 



• Where the model is given in terms of a modelled 
15 . observation this modelled observation does not 

depend on observations about other items and the 
errors are treated as independent random variables 

Examples of functional forms include ones where: 

20 

• the case profile has Q components 



• the item profile has Q + 1 components 
25 • the distribution of an observation depends on b j0 + 

£q=l aiqbjq 



The way in which observations depend on the profiles 
depends on the kind of observation. 

30 

Continuous variables - examples include 



• ratings (even if ratings are picked from a finite 
set/ it might be convenient to model them as 
35 continuous) , 



length of time viewing a web-page, 
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covariates such as age. 
A possible model of continuous variables is: y{a if b^ = 

5 kjO + £q=l aiqbjq 

Binary variables - examples include 

whether or not a customer has visited a web-page 
10 this session 

whether or not a customer has a pension 

A possible model of binary data is Pdla^bj) = logit" 1 
< b jo + L£i aiqbjq) 

where logit" 1 (x) - 1/(1 + e~ x ) . This is a common 
specification for binary data but many others are 
possible as well. 

A simple alternative is to use the model specified above 
for continuous data. Examples of ways to model ordinal 
and categorical variables are known. See for example 
Bartholomew and Knott (99) . 

2 . 3 Indeterminacy 

A feature of many of the models we describe is that, 
without additional assumptions, many different sets of 
item profiles give a good fit to the data. One option 
is to accept any set as estimates of the item profiles. 
Another is to make additional assumptions. These 
additional assumptions can improve the intelligability 
of the result by making it easier to compare results 
from different runs and using different data. 

If the model depends on case and item profiles via the 
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function b j0 + aiqbjg then an assumption that removes 

one source of indeterminacy is to require that each 
component of the case profile has unit variance and zero 
mean. 

5 

Those familiar with latent variable models will also be 
familiar with the indeterminacy known as rotation 
issues. In what follows we have used the default i.e. 
unrotated output from packages but it will be clear how 
10 to use rotated if available. 



3. B Step 



In Step B the item profiles are estimated as those that 
15 mean the item models fit the data well. 

1 . If the item models are expressed in terms of a 
modelled observation, then choose item profiles that 
approximate those that minimise a function of the 

20 errors, e.g. the sum of errors squared. 

2 . If the item model is expressed in terms of a 
probability distribution over observations then choose 
item profiles that approximate those that maximise the 

25 likelihood of the data. In practice we generally seek 
to maximise the log of the likelihood as this is more 
treatable. Item profiles that maximise one will 
maximise the other also. 



30 It is well known that these two general approaches are 

closely related, and indeed that in many cases there are 
distributional assumptions and functions of the errors 
that make them formally identical. To keep the 
description concise we will typically express the 

35 methods in terms of maximising the likelihood of the 

data, but it will be clear how to describe them in terms 
of minimising a function of the errors. 
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Fitting the model to the data would be a straightforward 
task if the case profiles were known. However the case 
profiles are not, at this stage, known. We give some 
examples of ways to estimate the item profiles in these 
5 circumstances. 

3.1 One preferred method (Approach 2) 

This method treats the case prof iles as parameters to be 
10 estimated along with the item profiles. The method is to 
estimate the item and case profiles jointly so that the 
item models fit the data. 

The loglikelihood of the observations about items, as a 
15 function of both case and item profiles is 

L{A,B) = \r\P(H\A t B) 

i j 

The method is to choose item and case profiles that 
approximately maximise the loglikelihood (A,B) = argmax 
L(A,B). 
(A,B) 

20 

The following method will give estimates that locally 
maximise the likelihood of the data. Experiment 
suggests that local maxima have similar likelihoods, so 
that in many cases it may be sufficient to accept the 
25 parameter estimates from a single run through these 
steps. Alternatively choose n (n=3 for example) 
different starting values, and choose the resulting 
parameter estimates associated with the highest 
likelihood. 

30 



The steps in the method are: 
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1. Define two sets of log likelihood functions, one 

for the case profiles ai , i = 1, I as a function of 

known item profiles, 



L(a,|S) = Uf(h Ja, b.) 



10 



15 



and one for the item profiles h } = 1, . . . , J as a 
function of known case profiles. 

m>j\A) =Y,tof(h g \a p b) 
\ /=i 

2. Choose starting values B°=(b a °, . .., b/) for the 
item profiles. These can be random variables. 
Alternatives include item profiles from previous 
versions runs of the model. It will be apparent that an 
alternative method is to start with values for A 0 , with 
obvious consequential changes. 

3. Then iterate the following two steps until there is 
convergence . 

(a) Choose A t+1 = (a x t+1 , . .., a z t+1 ) to maximise the 
log likelihood, given item profiles B t 

a/ +1 = argmaxLia^B ) 

(b) Choose B t+1 to maximise the log likelihood, 
given case profiles A t+1 

= argmaxL(bj\A^) 
b J 



20 
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4. Set B equal to the converged value of b\ and A to 
the converged A t . 

It will be apparent that some method for deciding 
5 whether the iterative procedure has converged or not 
will be needed. There are many ways to do this. An 
obvious method is to calculate the log likelihood of the 
data at the end of step b and to consider the procedure 
to have converged if the percentage fall in the log 

10 likelihood is less than some pre-set value, such as 0.1. 
The advantage of this iterative method is that, at each 
stage (a) or (b) the method involves estimating the 
parameters of a straightforward prediction function for 
a single dependent variable in terms of a number of 

15 known explanatory variables. This is the standard 

situation in statistical and econometric modelling, so 
that a wide variety of techniques, approaches, and fully 
worked examples for particular functional forms are 
known and can be used. Known examples include the 

20 functional forms for binary and continous data suggested 
earlier. 

* 

3 . 2 Latent variable method 

25 The latent variable method treats the case profiles as 
unobserved random variables. It fits the data by 
finding point estimates of the item profiles that 
maximise the likelihood of the data, given a prior 
distribution for the unobserved case profiles. An 

3 0 alternative, approximate, method find point estimates of 
the item profiles that give a good fit of the model 
correlation matrix to the correlation matrix for the 
data. 

35 One way to estimate the item profiles is to treat each 
case profile as an unobserved random variable. This is 
the approach to estimating latent variable models 
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(including factor analysis, latent trait analysis and 
similar models) and many examples and methods are known. 
Many are described in Bartholomew and Knott (99) . In 
this literature the item profiles are often referred to 
5 as factor loadings. 

3.3 Latent Variable Method I - Full Information 
Maxiumun Likelihood 

10 This note describes a method for estimating latent 
variable models based on maximising the likelihood 
function. 

1. Make a distributional assumption about the case 
15 profiles. The usual assumption is that they are 

standard normal. a iq = (N (0,1) and are statistically 
independent of the errors. In addition it is 
usually assumed that the case profile components 
are statistically independent of each other. 

20 

2. Write down the expected log likelihood of the data. 
The probability of any particular case is : 

P(y,\a,B) = iWtfla.B) 

a is an unobserved random variable and the expected 
probability (or equivalently the expected likelihood or 
25 marginal distribution) of y L is: 

P(y,\B) =E^(a)np(y^|a l e) 

a M 

Looking at all observations in the dataset together 
gives the overall expected probability (or equivalently 
the expected likelihood or marginal distribution) : 



WO 02/10954 



PCT/GB01/03383 



- 131 - 
P(Y\B) =n E^nPCyJa.B) 

The log likelihood of item profiles B is the log of this 

L(B) = lnP(V|B) 

' j 
=E ln E ^(a)nP(y Ja.fl) 

3. Estimate dtem profiles to maximise the log 
likelihood. 

B = arg max L (B) 
5 B 

3.3.1 EM algorithm 

Step 3, the estimation of the parameters, can be 
difficult. One method is to use a well known iterative 
scheme known as the EM algorithm. The EM algorithm 
iteratively estimates parameters that maximise the 
expected value of the log likelihood of the observations 
and case profiles, where the expectation is with respect 
to the density estimates of the case profiles. Thus the 
EM algorithm jointly estimates case and item profiles. 
The application of this algorithm to latent variable 
models is described in Bartholomew and Knott (99) where 
they give examples for different kinds of variable. 

Methods implementing full information maximum likelihood 
have been implemented in a number of software 
programmes, for example TWOMISS estimates models for 
binary data for Q=I or 2. The software is available on 
a website of the publishers of Bartholomew and Knott 
(99) , arnoldpublishers.com/support/lvmfa2 .htm. 

The program is described in the document latv.pdf 



10 



15 



20 



25 
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available on the site. This document also contains a 
detailed description of the model and the EM method of 
estimation. References to other' packages for binary and 
other models can be found in Bartholomew and Knott (99) . 

5 

3.4 Latent Variable Method II - Fitting the correlation 
matrix 

An alternative method that can be used whenever 
10 observations are ordered variables is based on 2 steps: 

1. recast the model so that it reflects an underlying 
linear model 

!5 2. estimate the parameters of the underlying linear 
model by fitting the covariance or correlation 
matrix. 

This method is generally fast because only summary 
20 statistics are needed. 

3.4.1 The underlying linear model 

The linear model assumes that observations are random 
25 variables with distribution: 

Q 

where the error term is a random variable with zero 
mean and variance \|r 3 , which is independent of the 
observations, of the case profile, and of other error 
terms, and the q'th component a iq of the case profile is 
30 a random variable with mean zero and unit variance. 
This mqdel implies a covariance matrix of 



irini 
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3.4.2 Estimating the parameters of the linear model 

One method for estimating the profiles of the linear 
model is to fit the covariance matrix for the model to 
5 that of the data. The programme LISREL does this. The 
correlation matrix can be used in place of the 
covariance matrix. The steps of the method are: 

1- Calculate the correlation matrix for the 
10 observations. This can be done using standard 

statistical packages such as S-PLUS or PREL1S 
(distributed with LISREL) . 

2. Assume that the components of the case profile are 
15 independent and use standard factor analysis, for 

example using S-PLUS, of the correlation matrix to 
estimate the (J parameters. 

3.4.3 Recasting the original model in terms of an 

20 underlying linear model 

The method can be used for different types of 
observation. Examples are described in Bartholomew and 
Knott (99) . 



25 



30 



Continuous variables. The 3 variables can be identified 
directly with item profiles. 

Binary variables. In this case the method is 



1. assume that underlying each item j is an underlying 
continuous variable e 3 and a threshold 1 5 . Together 
these determine the observations for that item - an 
observation is 1 if z is above the threshold, and 0 
35 otherwise. 
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10 



15 



fl if 
[0 oi 



otherwise 

2 . Under this assumption calculate a tetrachoric 
correlation matrix from the observations. This is 
a known technique that estimates the correlation 
matrix of the inferred underlying variables. The 
estimation can be done using PRELUS. 

3 . Estimate the linear model for these underlying 
variables, generating estimates for the (3 
parameters . 

To recover the item profiles for a model of binary data 
from these parameter estimates: 

1 . Use the logit model for binary data 

2. Derive the item profiles b jq for the binary 
observation model from these factor loadings 
according to: 

bi =JL h 



l/3 



N <7=1 



for j * 0, and logit" 1 (b j0 ) = n j where n j = the 
20 proportion of observations of item j equal to 1 

3. There is an exception to the equation (1) above. 
In some cases the item profiles from the linear 
factor model are such 

25 E (P*> 2 *1 



in which case the equation in (1) does not give 
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sensible results. These cases are known as Heywood 
cases. For Hewood eases (in practice whenever 

E (P;/*0.9) 

5 fl=1 

we replace the relevant part of (1) with (2) below. 



n 








2-E(3 y? ) 2 







(2) 



In doing so we follow one of the suggestions of 
Bartholomew and Knott in section 3.18 of their 
book. We could alternatively have used other known 
10 methods for dealing with Heywood cases. 

Ordinal data - Bartholomew and Knott (99) describe a way 
to recast ordinal variable problems in terms of an 
underlying continuous model. 

15 

3.5 2 Stage method 



The 2 stage method is another method that fits the data 
by finding point estimates of both item and case 
20 profiles. It first estimates case profiles using a 

simple linear model. Then, treating these as observed 
variables, it estimates item profiles. 

The method is in two stages. 

25 

1. Generate estimated user profile 

2. Estimate the item profiles treating user profiles 
as known. 



30 



WO 02/10954 



PCT/GB01/03383 



- 136 - 

3.5.1 B Step 

1. Derive pseudo-item profiles 

5 Use a simple linear model to derive pseudo-item 

profiles. Appropriate examples include the normal 
linear factor model and Principal Component 
Analysis . 

10 2. Generate estimated user profiles 

Derive point estimates of each case profile a d/ 
using the pseudo-item profiles. One method is to 
use the A Step of the PCA method. 

15 

3. Estimate the item profiles treating user profiles 
as known 

Now that we have estimates of the user profiles, 
20 these can be treated as known in the item models, 

leaving only the item profiles as free parameters. 
The item profile for item j can now be estimated 
by: 

25 (a) write down a set of the loglikelihood 

functions, one for each item, as a 
function of known case profiles 



L(bj\A) =i\nf(h.\a p bp 



30 



(b) choose an item profile for j that 
maximises the loglikelihood, 

bj = arg max L (b^A) 
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There are a wide range of estimation 
procedures for this kind of problem. 

3.5.2 Applying the method to different types of item 

5 

We described the method as though all items were 
considered together when deriving the pseudo-item 
profiles and the estimates of the user profiles. In 
some cases it might be appropriate to consider items in 
10 separate groups, with separate sets of user profile 

components associated with each group. For example, the 
dataset of observations about a user may contain some 
items relating to preferences over objects, and some 
indicators of socioeconomic group. Treating these two 
15 groups separately reduces the number of free parameters 
that need to be estimated for a given number of overall 
components in a user profile. If the two groups do 
largely act as indicators of different components of the 
user's profile then this approach can lead to better 
estimates of the parameters that remain and to more 
accurate predictions. The method is: 



20 



1. Estimate pseudo item profiles and case profiles for 
each group of items separately. The number of 
25 components in group g is Q 9 . 



2 . Combine the case profiles from the different 
groups, so that each case profile contains S g Q s 
components . 

30 

3 . Continue as before . 



3 . 6 Principal Components Analysis 

Principal components analysis generates a mathematical 
transformation of the observations that gives both item 
profiles and case profiles. 
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This section describes a method for using Principal 
Components Analysis (PCA) to find the item profiles. As 
a technique PCA has the advantage that it is quick, and 
routines to implement it are well known and widely 
5 available in statistical packages. 

3.6.1 The theory 

PCA is a well known procedure that is used to reduce the 
dimensionality of a dataset while minimising the loss of 
information. The method is to transform the original 
variables for a case, y ljf j - 1 ( .. w J f to a new set of 
uncorrelated variables, a iq , q = 1, ...,0, called 
principal components, which contain most of the 
information about the variance in the original data. 
These new variables are linear combinations of the 
original variables so that: 

a iq = b lq (y u - b 10 + ... + b Jq (y iQ =b Jq ) , q = l, . q 

or more compactly A = (3 T (Y - B 0 ) . Here b j0 is the 
average value for observations y i3 about item j . B T 
denotes the transpose if the item profile matrix, 
omitting the constant terms B 0 . We impose the 
25 normalisation that 

i (v 2 = r 

The first principal component, a il# is found by choosing 
kji/ j = 1, . .., J, so that a n has the largest possible 
variance. The second principal component is found by 
choosing b j2 so that a i2 has the largest possible variance 
30 subject to it being uncorrelated with the first 
principal component and so on. 

This approach models the data in the following sense. 



10 



15 
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If the number of principal components is equal to the 
number of original variables (Q = J) then it is a result 
of linear algebra that we can invert the equations to 
write Y = B 0 + BA. If we ignore some of the later 
transformed variables (Q < J) that account for only a 
small part of the variance, then we can get a model of 
the data £ = B 0 + BA which will have the property that 
errors between $ and will be small. 

3.6.2 B Step in practice 

Calculate the covariance matrix for the data. This 
can be done using a standard stats package. 

Find the Q principal components of the data by 
analysis of the covariance matrix. This can be 
done using standard statistical packages such as 
S-PLUS. (In practice packages can also take the 
raw data as an input and calculate the matrix as 
part of the estimation procedure) . 

For each item j set b j0 equal the average 
observation for that item. 

For each item j and component q * 0 set b jq equal to 
the weighting associated with item j on the q th 
principal component 

Making Predictions 

We give a number of examples. 
4.1 Example One (Approach 2) 

• A step - derive a point estimate a A of the case 
35 profile 

• Y step - enter that point estimate into the 



15 2 



20 



25 4 
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relevant item model or models to derive a point 
prediction of the observation for that item. 

4.1.1 A step 

5 

Within the literature on hidden variable models various 
statistical methods have been described to derive a 
.point estimate of the true value of the case profile. 
Examples, are described in Bartholomew and Knott (99), 
0 the LISREL 8 handbook [LISREL 8: User's Reference Guide, 
(1996) Joreskog and Sorbom, publ . Scientific Software 
International] and in references therein. The method we 
describe here is to maximise the likelihood of the data. 

Take all the observations about a case as the 
sample. The same case profile will enter into the 
model for each of these observations, but the item 
profiles will be different for each. 

Treat the observations as the dependent variables, 
the" item profiles as the explanatory variables, and 
the case profile as the parameters to be estimated. 

Define a likelihood of for the data for a case 
profile as L(a ± |BJ = £ J j=x In f (y^a*, bj) . 

Estimate the case profile to maximise the 
likelihood of the data: a = arg mi^ Ha^B). 

30 This last step involves the same calculations as step 

3 (a) in the iterative process to derive item profiles in 
the Appraoch 2 method for item profiles. 

4.1.2 Y step 

35 

Using the estimated case and item profiles, predict 
observations y Aj about items using the item model. 



15 1. 



20 2 



25 
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It will be clear that in many cases a suitable point 
prediction is the expected observation 



u 



With binary data this reduces to ^ = f (l|§ i# b 3 ) . 
Equally it will be clear that we could use information 
5 about the predicted distribution. 

4 . 2 Bayesian 

A better method is to use Bayesian updating. This is a 
10 statistical method that treats the customer profile as a 
random variable with a specified distribution. 
Alternatively we can say that it treats the customer 
profiles as parameters, but that knowledge of the 
parameters is probabilistic and prior knowledge is given 
15 by a distribution. 

This method has advantages. 



20 



It is consistent with the latent variable method 
for estimating item profiles in the following 
sense. In the latent variable approach all that is 
known about a user's profile, given their 
observations, is contained in the Bayesian 
posterior distribution over possible profiles. 



25 



30 



It is conservative, in the sense that any point 
estimate of a user's profile based on the Bayesian 
posterior will not be very sensitive to small 
changes in the observations. This reduces the 
potential for overfitting and improves the accuracy 
of out of sample predictions. 



Unlike Approach 2 A step, it can be used even if 
item models have different forms 
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10 



4.2.1 A step 

1. Specify a prior distribution over case profiles. 
Experiment suggests that the exact form of the 
prior has little effect on the results. 

(a) To be consistent with the assumptions made 
when estimating the item profiles using the 
latent trait method, we assume that each 
component of the case profile has a standard 
normal distribution. a iq ~ N (0,1). In 
practice we will need to approximate this 
using a discrete distribution. In the 
15 examples we used a binomial distribution with 

a sample size of 4, where the number of 
successes is transformed so that they are 
evenly distributed about 0. Thus a iq e{-2, - 
1,0,1,2} and : 

*j - 1 41 



Iq 2 4 (2+8^)1(2-3^)1 



(b) An alternative method when using the 2 stage, 
Approach 2 or PCA methods for estimating item 
profiles is to generate a prior distribution 
during the B step. The method is to use the 
actual distribution of case profiles as the 
prior distribution. To be practical the 
actual distribution needs to be approximated 
by a discrete distribution with a small number 
of points. Various methods are obvious. For 
example, for the 2 stage process a simple 
example could be to (i) set out the discrete 
values that each profile component can take 
when making recommendations, say 
a iq e {-2,-1,0,1,2} (ii) set P(a iq ) equal to the 
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proportion of cases for which the estimated 
profile component a iq is closest to a iq . For 
example P (a i2 = -1) will be the proportion of 
cases for which a i2 lies between -1.5 and -0.5. 

5 

Another example suitable for any of these 
methods is: 

(i) for each component q calculate the 
10 standard deviation o q 

(ii) define the discrete values that each 
profile component can take when making 
recommendations as a iq e{ -2a q , -a q/ o, a q , 2a q } 

15 

(iii) Set P(a iq ) equal to the proportion of 
cases for which the estimated profile 
component a iq is closest to a iq . 

2 0 2. Update the distribution over possible case profiles 
in the light of observations about the case to give 
a posterior distribution P (a^y^ using Bayesian 
inference. Standard calculations give: 

P(a)P(y f \a jt B) 
P(a / |y / ) = 

a 

where PUJ = n^i P(a i3 ) and PCy^, B) = n J j=1 f 
4.2.2 Y step 



30 



The probabilistic knowledge of the case profile can be 
combined with the item models in a number of ways to 
predict observations. A simple approach is to take the 
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expected observation as the prediction. 

% = E y£ Pi*,\y)ny\a,,b) 

y ai 

In the example of binary data where observations are 
either 0 or 1, this simplifies to: 

=£P(a,|y,K(1|V>,) 

ai 

Equally clearly, if further steps depend in the whole 
5 distribution g(y dj ) over observations then a suitable 
form would be 



s EP(*,\y)f(Pt\;t>p 

ai 



4 . 3 PCA 

The best method would be to use a Bayesian method with 
10 PCA. 

A fast and simple alternative is to use the PCA 
equations to define a PCA method. 

15 A Step: 

3 iq = b lq (y n - b 10 ) + ... + b Jq (Y 1Q - b Jq ) , q « 1, . . . ,Q 

Y Step: The prediction step also uses the PCA model 
20 directly to give: 



= b jo + t V* 

qr=1 
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4 . 4 Using a reduced set of case observations $J 

In some circumstances we may want to make to make 
predictions about an observation for an item in the 
light of what is known about observations only in 
respect of other items. The most important example is 
where data records which items a customer has selected 
previously, and the task is to predict whether a 
particular item is likely to be selected. Ideally the 
observation that the item has not yet been selected is 
ignored. In other words predictions about item j are 
made in the light, of a reduced set of case observations 
Si 3 which omits observation Y Aj : 



Where predictions need to be made about a number of 
15 items, the ideal process would be, for each item j for 
which a prediction is needed: 

A Step - generate knowledge about the case profile using 
the reduced set of case observations that omits the 
20 observation about item j 

Y Step - use the knowledge so generated to make a 
prediction about item j . 

25 This ideal approach does involve some sacrifice of speed 
and a faster though less accurate, alternative is to: 

A Step - generate knowledge about the case profile using 
either the full set of observations about the case 
3 0 (suitable when making predictions only about a small 

number of items) , or using a reduced set of observations 
that omits the observations about all the items for 
which predictions are needed (suitable when making 
predictions about many items) . 
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Y Step - use the knowledge so generated to -make 
predictions about all the relevant items. 

5. Using covariates 

5 

Covariates are variables with observations Z ik , k « J + 
I, R, that are informative about a case, but which 

are not items about which predictions are wanted. 

10 5.1 Treating covariates as items 

One straightforward way to incorporate some covariates 
is to treat them as though they were items. For each 
covariate to be treated this way: 



15 



20 



D Step 1. Create a new item with index k with 
observations z ik/ i=l, ...,i 

M Step 2. Specify an item profile and model 

f (Yikl^i, b k ) , depending on the type of 
variable . 



B Step 3. Estimate the profile for the covariates 
at the same time and in the same way as 
25 for the other items. 

A Step 4. Update these case profiles in the light 

of observations about these covariates in 
exactly the same way as observations 
30 about other items. 

Y Step Do not make predictions about these 
covariates . 



35 



This approach will ensure that information about 
covariates will influence predictions - observations 
about covariates will be used to update a case profile, 
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and this will then affect predictions. The approach has 
a number of advantages. 

It can cope easily with missing observations. 

The methods for all the steps D-A go through 
unchanged. 

It is particularly easy to interpret the results 
and to use covariates to help target messages - the 
covariate/ profiles can be shown in visual 
representations in exactly the same way as item 
profiles . 

15 5.2 Covariates as observed components of a case profile 

Another way to treat covariates is as observed 
components of a case profile. 

2 0 5.2.1 M Step 

One way to specify the model is to choose item models 
that are functions of 

b j0 + Sq=1 a iq b iq + ]C£=Q+1 Z ilPjk* 



25 The item profile now has K rather than Q components. 
5.2.2 B Step 



10 



30 



2 stage method - This method provides a straightforward 
way to include some covariates as directly observed 
components of the user profile. The method is: 
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1. Ignore these covariates when estimating the 
pseudo-item profiles and case profiles. 

2. Include the covariates as observed variables in the 
5 - item models. 

3. Estimate the item profiles as before, treating both 
the case profile and the covariates as observed 
variables. 

Latent variable method. Examples of estimating item 
profiles in latent variable models with covariates are 
known. For example see Moustaki (2001), "A general class 
of latent variable models for ordinal manifest variables 
with covariate effects on the manifest and latent 
variables", London School of Economics Statistics 
Research Report January 2001, LSERR58 , and references 
therein. 

20 5.2.3 A Step 

Bayesian method - The method is unchanged, though the 
functional forms of the equations will need to be able 
to accommodate the covariates. 

25 

6 . Using prior information about items 

In many cases system administrators will have prior 
knowledge about items. Examples include: 

30 

What are the latent variables that determine 
observations, and what items do they most affect. 

• The time of year when it is best to visit 
3 5 particular holiday destinations 



10 



15 



Cost 
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The genre of movies. 

Using this knowledge can be beneficial. 

5 • it may improve accuracy, as it adds information 
into the system, or reduces the number of free 
parameters needed to fit the data well 

• Aids knowledge discovery and control by ensuring 
10 the relationships in the model reflect the 

administrators prior knowledge. 

One way to use any of these forms of prior knowledge 
about items is to impose prior restrictions on the item 
15 profiles. ^ _ 

6.1 Prior knowledge about the latent variables 

One form of prior knowledge is about what the latent 
20 variables that determine observations are, and which 

observations are most strongly related to each of these 
factors. One way to incorporate this knowledge is to 
modify the model specification step as follows. The 
other steps are unaffected. 

25 

6.1.1 M Step 

1. Identify the underlying latent variables and list 
which items are strongly related to which latent 

30 variables. 

2. Specify item models that are functions of b j0 + 
a iqkjq 

35 3. Fix b jq to be 0 if item j is not strongly related to 
latent variable q. 
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4. Set the correlations between components in the case 
profile to be free parameters. 

B step - A convenient method to estimate item profiles 
5 is to use the LISREL package. The LISREL 8 manual 

describes how to estimate models when some item profile 
components are set to zero and where the correlation 
between components are to be estimated. 

10 7. Missing values 

This section describes how to deal with cases where some 
observations are missing (denoted x) . 

• observations record a customers own assessment of 
the suitability of some of the items, for example 
of movies or books . The recommendation task is to 
predict the suitability of those items the customer 
has not rated. 

• observations record whether or not a customer 
responded favourably to a cross -sell suggestion 
made by a call center operative. The observation 
is 0 if the customer didn't take up the offer, 1 if 
she did and missing if no offer for that item has 
been made. 

One method is to assume that observations are missing at 
random, by which we mean that we assume that whether or 
30 not is missing is independent of the case profile. 

7.1.1 Example One (Approach 2) 

When defining the likelihood function, omit observations 
35 that are missing, or define their probability as equal 
to something independent of the case profile (for 
example equal to 1 or to the proportion of observations 



15 



20 



25 
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'J 

about that item that are missing) . 

7.1.2 Latent trait - maximum likelihood methods 

5 When defining the likelihood function, omit observations 
that are missing, or define their probability as equal 
to something independent of the case profile. The 
programme TWOMISS does this for binary data when some 
observations are missing at random. 

10 

7.1.3 Latent trait - assuming an underlying linear 
factor model 

Modify the procedure for calculating the estimated 
15 correlation matrix for the inferred underlying continous 
variables- When estimating the correlation between the 
inferred variables underlying observations for items jl 
and j2, omit any cases for which either observation is 
missing. PRELIS will do this automatically if the 
20 option for pairwise deletion is specified when 
estimating the correlation matrix. 

7.1.4 PCA 

25 Calculate the covariance matrix using pairwise deletion, 
as for latent trait above. 

7.2 A step 

30 7.2.1 Bayesian 

Ignore missing observations when updating beliefs about 
a case profile. 

35 7.2.2 Example One (Approach 2) 

Omit missing observations from the sample used to fit 
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the case profile to the observations about that case. 
7-2.3 PCA 



5 Replace missing observations about item j with the 
expected value b j0 . 

8. Choosing the set of free parameters 

10 So far we have assumed the set of free parameters is 
fixed at the M Step. A better procedure is to choose 
the set of free parameters in the light of the data. 
This is an example of a model selection problem. In 
choosing the set we need to balance two effects. 

15 Increasing the number of parameters will, on the one 
hand, give the model greater scope to fit complex 
relationships between the variables and improve its 
ability to predict behaviour out-of -sample . On the other 
hand it will also increase the scope for the model to 

20 fit idiosyncratic features of the training data which 
• are not seen in out-of -sample cases. This will harm the 
models ability to make good predictions. 

There are many known methods for selecting between 
25 models in the light of the data. We describe one 
example . 



8.1 The Akaike Information Criterion 



The Akaike Information Criterion (the AIC) is one method 
for balancing these two effects. The method scores a 
model according to the likelihood of the data and a 
penalty term that increases as the number of parameters 
increases. More precisely, if e is the set of estimated 
parameters for a model, and p is the number of free 
parameters, then the AIC is: 
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-2L (0) + 2p 

Models with low values of the AIC are preferred. 
5 8.2 Choosing Q 

One example of choosing the set of free parameters is to 
use the AIC to choose the number of components Q. When 
designing a rule to choose the number of components we 
10 need to trade off accuracy of predictions against speed 
and intelligability of the resulting model. A simple 
rule that did this could be: 

1. Estimate the model with Q = 1, 2, and 3 

15 

2 . Estimate the AIC for each number of components 

3 . Select the model with the lowest AIC 

20 Latent trait method. In the latent trait method the 
free parameters in the B Step are the item profiles. 
These maximise the likelihood at B. Each item profile 
is a list of Q + 1 numbers so that the AIC for Q is: 

25 AIC(Q) = -2L(B) + 2(Q+1)J 

The above explains how to find item profiles for given Q 
using PCA. We also need to choose Q. PCA is a 
mathematical procedure rather than a statistical model 
3 0 so there is no statistical test that we can use to 

decide when adding more components will make matters 
worse rather than better. 

One approach is to choose Q as the cutoff between 
3 5 eigenvectors with eigenvalues greater than 1 and those 
with eigenvalues less than 1. Examples suggest that 
this can lead to a large number of components being 



10 



15 
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retained. Instead in our example we choose 3 
components, as being a good compromise between lots of 
components, which would lead to more accurate 
predictions, and fewer components, which are easier for 
> system administrators to visualise. 

8.3 Fixing item profile components 

One way to reduce the number of free parameters is to 
fix some of the item profile components, for example to 
be 0 . A process of model selection that allowed item 
profile components to be fixed would look for item 
profiles for which: 

a large number of individual item profile 
components are 0 

the AIC is low (or out of sample predictions are 
accurate) . 



20 



25 



30 



The advantages of this approach are: 

it is easier to interpret the item profiles when 
more item profile components are 0 

for the same number of components the AIC will be 
lower, potentially giving more accurate predictions 

it is possible to increase the number of components 
whilst continuing to reduce the AIC, potentially 
giving more accurate predictions ■ 



The LISREL 8 handbook describes in detail how to 
estimate models with fixed parameters. It will be clear 
35 how to modify the steps to accommodate this. 



8.3.1 



Initial values 
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Schemes for selecting a model will typically require an 
initial set of parameter restrictions. One method for 
generating this is to : 

1. estimate parameters for the case where no item 
profile components are restricted. 



2. choose a rotation of the item profiles, from 
10 amongst those that leave the likelihood unchanged, 

which gives simple structure 

3 . fix those item profile components which are small 
in the resulting model to be zero. 



15 



7.3. Selection bias 



In some examples data about some items will record the 
suitability of the item rather than simply whether the 

20 item has been sampled or not. In these cases the 

suitability is only recorded for those items that have 
been sampled. If there is a correlation between the 
suitability of an item, and whether or not it is 
sampled, then models that fit the observed data may be 

25 subject to selection bias. The models will fit 

suitability conditional on selection, whereas we may 
want to base predictions on the unconditional 
suitability. 

30 A known method of dealing with selection bias is 

described in Moustake (2000) . The data in this example 
is binary, with some missing values, and where values 
are not missing at random. 



35 



An alternative way to think about this is to note that 
in some cases it is sensible to think that whether or 
not an observation is missing does depend on the case 
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profile. 

One way to deal with selection bias is to specify the 
estimation function as being a combination of two other 
5 functions. The first models whether or not the item has 
been selected and an observation is present. The second 
models the observation, unconditional on its being 
present. Predictions about missing observations (the 
recommendation function) will be based on this model of 
10 unconditional observations. 

This method can be implemented using known techniques 
for correcting for selection bias in the F module (where 
case profiles are treated as known and the goal is to 

15 estimate the item profiles) such as Heckman regression. 
Preferably all components in the case profiles enter 
into the model of selection and at least one component 
of a case profile does not enter into the model of 
ratings. And the components of the item profile that 

2 0 enter into the selection model are different from those 
that enter into the model of unconditional observations. 

O'Muircheartaigh and Moustaki (99), "Symmetric pattern 
models: a latent variable approach to item non-response 

25 in attitude scales" Journal of the Royal Statistical 

Society (1999) 162 part 2, pp 177-194, give an example 
of a method for dealing with this. They suppose that 
each observation is the result of two random variables, 
a rating variable using the observation unconditioned on 

30 it being present, and a selection variable y 3 which 

models whether the observation is present or missing. 
Both depend on the case profile and are independent 
conditional on this profile. The distributions are 
g(y r ja i b^and h(y s ja i# bj) . The authors estimate an 

\5 example model and predict values for the missing 
variables - i.e. they show steps M through Y. 
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A step - use the. models for both y r and y s to estimate a 
user profile 

Y step - when making recommendations, we fit the model 
5 for y r " 

10. Examples 



10 



In all of these examples the data is binary, and in most 
the item model takes the form: 



f(y 9 \a P b) = 



1 -logit-' (b J0 +Y% =1 a iq b jq ) otherwise 



where 



/og/f- 1 (x)=- 



1 +e 



-X 



10. 1 



Example 1 



15 



This example uses the approach 2 method. For each item 
the model is 



1 -s(a H d yt + a i2 b^ otherwise 



where s (x) = max {0, min {1, x} } 



We require that the user and object profiles belong to a 
set of discrete values. This keeps the example simple. 

20 

a iq e {0,0.25,0.50,0.75,1}, i = 1, 4, q = 1,2 

b )q € {0,0.25,0.50,0.75,1}, j = 1, ...,4, q = 1,2 



25 10.2 Example 2 
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This example uses binary data, with item models based on 
the logit function described above. Estimates of the 
item profiles are made using the latent trait method 
with full information maximum likelihood estimation. 
> The number of components is fixed to be 2 . 

Recommendations are made using the Bayesian method. The 
case history is modified by setting all observations of 
a 0 to be missing. We used the software package TWOMISS 
to implement step B. The software is available on a 
website of the publishers of Bartholomew and Knott (99), 
arnoldpublishers.com/support/lvmfa2.htm. The program is 
described in the document latv.pdl available on the 
site. This document also contains a detailed 
description of the model and the EM method of 
15 estimation. 

10.3 Example 3 



10 



20 



25 



30 



This example is similar to example 2 but estimates the 
item profiles by fitting the correlation matrix, and 
chooses the number of components using the AIC. 



10 .4 Example 4 

This is similar to 3 but includes a covariate treated as 
an item. 

10.5 Example 5 

This example is similar to the above two, but uses the 2 
stage method to estimate the item profiles. 

10.6 Example 6 

35 This example includes a covariate which is treated as an 
item. This uses the London Attractions dataset, 
including an additional binary variable which is 1 if 
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the average child age in the family is above 10 and 0 
otherwise . 

10.7 Example 7 

5 

This example uses PCA to estimate item profiles and make 
recommendations . 

10.3 Example 8 

10 

This example illustrates the A step for the Bayesian 
method if a reduced set of case observations is used. 

10.9 Example 9 

15 

This example imposes restrictions on the item profiles 
to reflect prior knowledge of the latent variables . 
This is an extension of the latent variable method II to 
allow for different parameter restrictions. The example 
2 0 shows how to estimate the (3 variables from the 

underlying linear model. The transformation of these to 
the item profiles of the original binary model is as 
before . 

25 It will be appreciated that the embodiments of the 

invention described above are illustrative examples only 
thereof and that the scope of the invention is limited 
only by the appended claims. 
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Appendix A 



1 . 1 The set of items 

The data in the database example describe visits to a 
number of London Attractions. There are 20 attractions 
These attractions are labelled in various ways in what 
follows. The labels, and the attraction identities, 
are : 



10 



15 



20 



25 



30 



35 





Brighton 


l 




v-.ne s s mgc on 


2 


NATGAL 

xti^i a. \JrHJ 


National Gallery 


3 




Hampton Court Gardens 


4 




Science Museum 


5 




Whipsnade 


6 


LEGO 


Legoland 


7 


EASTBORN 


Eastbourne 


8 


LONAQUA 


London Aquarium 


9 


WESTABBY 


Westminster Abbey 


10 


KEW 


Kew Gardens 


11 


LONZOO 


London Zoo 


12 


MADTUS 


Madam Tussauds 


13 


BRITMUS 


British Museum 


14 


OXFORD 


Oxford 


15 


THORPE 


Thorpe Park 


16 


NATHIST 


Natural History Museum 17 


TOWER 


Tower of London 


18 


WINDSOR 


Windsor Castle 


19 


WOBORN 


Woburn Wildlife Park 


20 



1.2 The data set 

The data records attendance at each attraction for 624 
users. Each user is represented by a row in the data 
set. The first column in the row is the first 
attraction (Brighton) , the second column is the second 
attraction (Chessington) and so on. The data records 
"1" if the user has visited the attraction in the past 4 
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years, and 0 otherwise. The following gives the first 
10 records from the dataset (the full set is in Appendix 
A) . As an example, this data records that the first 
user has visited Brighton and the National Gallery, but 
5 not Chessington. 



Extract begins 

10111000111111101110 
11111011111111011110 



10 0111 10100111 



11111110 



0011,1010111111101110 
00101000111001001000 

11111111111111111111 

01111101110101001110 



15 110111100111010110 



0 1 



10101100001001101100 
01111000001001001110 



-Extract ends- 



2 0 2.1 Derive pseudo-item profiles 

. To derive the item profiles from the data the program S- 
PLUS was used. Three versions of their factor analysis 
function were run, specifying 1, 2 and 3 factors 
respectively. The following gives the S-PLUS call and 

25 the output for the 2 factor version. These factors are 
standardised. 



Extract starts 



30 > round (unclass (f actanal (Dom.x [1:500, ] , f actors=2) $load) , 3) 

Factorl Factor2 

bright 0.079 0.043 

chess -0.061 0.354 

natgal 0.385 -0.087 
35 hampt 0.241 0.006 

science 0.332 0.064 

whip 0.229 0.091 
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lego 


0 


. 065 


o 


1 6Q 


east 


0 




n 


no c 


lonaqu 


0 


. z. j. o 


- n 


. u u± 


westab 


o 


75 Q 


— u 




kew 


o 


377 


Q 


- uoo 


lonzoo 


0 


. A J / 






nia.Qa.mfc 


0 




A 


n on 


britm 


o 


. *± / o 




m 7 


oxford 


0. 


369 


0 . 


066 


thorpe 


-0. 


008 


0. 


997 


nathist 


0. 


345 


0. 


043 


tower 


0. 


425 


0. 


003 


wind 


0. 


338 


0. 


048 


woburn 


0. 


191 


0. 


129 



Extract ends 

These factor loadings are taken as the item profiles. 
Because the loadings are standardised, there is no b 0 . 
For example the item profile for Woburn is (b lt b 2 ) = 
20 (0.191, 0.129) . 



2.2 Generate estimates of the user profiles 
For each user we used these factor loadings to generate 
an estimated user profile. Component q in the profile 
25 is equal to the sum of each observation multiplied by 
component q in the relevant item profile: i.e. 

% - £ "ft. 

j 

These are available automatically from S-PLUS using the 
score parameter. The following shows S - PLUS call and 
the resulting scores for the first 5 users in the 
30 database. 
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Extract begins 

> factanal (Dom.x [1: 500, J , scores-' reg' , factors=2 ) $scores [1 : 5, ] 



Factorl Factor2 
5 1 -0.1661745 -0.6675610 

2 -0.6143931 -0.6655715 

3 -0.7493019 -0.6639595 

4 -0.5263396 -0.6660611 

5 -0.3366707 -0.6651219 

10 Extract ends 

2.3 Generate Item Profiles 

Using these estimated user profiles the item profiles 
were generated. A logit regression function in S-PLUS, 
15 glim, was called specifying the user profiles as the 
independent variables. An example for Brighton is 
shown . 

Extract begins 

20 Call: glm(f ormula = bright - fl + f2, family = binomialO, 

data - big.dog2) 

Coefficients : 

(Intercept) fl f2 

25 -0.66083 0.24780 0.09124 

Degrees of Freedom: 499 Total (i.e. Null); 497 Residual 
Null Deviance: 642.4 

Residual Deviance: 636.8 AIC: 642.8 
3 0 Extract ends 

The result gives the item profile for Brighton as (b 0 , 

b 2 , b 2 ) = (-0.661, 0.248 f 0.091). The full set of 

results are shown below. In this table the components 
35 are listed in the order (1,2,0). 
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Extract begins 







[,1] 




[,2] 




[,31 

I f J J 


[1., ] 


0 


.24779997 


0 


.091235765 


-0 


66082865 


[2, ] 


-0. 


21544381 


0 


.754903543 


-0 


18170548 


[3, ] 


1. 


53636908 


-0 


424177397 


-1 


75295313 


[4,] 


0. 


80029653 


-0 


001894496 


-1 


05189359 


[5, ] 


1. 


50012265 


0 


194537695 


0 . 


06676404 


[6, ] 


0. 


77903453 


0. 


221078866 


-1 . 


65736390 


P,] 


0. 


20997573 


0. 


338806740 


-0. 


08729226 


[8, ] 


0. 


51292535 


0. 


066094474 


-2. 


41805007 


[9, ] 


0. 


70743844 


-0. 


012873143 


-0 . 


91289761 


[10, ] 


1. 


06350153 


-0. 


321008989 


-2. 


69301485 


[11,3 


1. 


40188843 


0. 


111778939 


-1. 


61679712 


[12, ] 


0. 


89624918 


0. 


328477350 


-0. 


05714305 


[13, ] 


0. 


86897447 


0.217827415 


-1 . 


59056044 


[14, 1 


2. 


09201506 


-0. 


098552427 


_o 


o h ft u du yo 


[15,] 


1. 


42967216 


0. 


145618309 


-2. 


61659654 


[16,] 


-0. 


09497242 


10. 


697211868 


-4 . 


48776360 


[17,] 


1. 


44575482 


0. 


123545459 


-0. 


25139096 


[18,] 


1. 


73629559 


-0. 


067640956 


-1. 


44709209 


[19,] 


1. 


23460197 


0. 


088305200 


-2. 


07386916 


[20,] 


0. 


75330360 


0. 


410859138 


-2. 


63379257 



Extract ends 



25 2.4 Choose the number of components. 

The steps above were performed for 1, 2 and 3 components 
respectively, and the AIC was compared in each case. 
The AIC was calculated as the sum of the AIC for the 
logit regressions. The results were: 

30 

1 10348.77 

2 10276.46 

3 10370.49 



35 



The lowest value of the AIC is for 2 components (where 
the constant term b 0 is not included as a component), and 
this model is used to make recommendations. 
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Once the item profiles have been generated they are used 
to make recommendations in the on-line recommendation 
engine. The following gives an example fot a single 
user. The routines to implement the steps were written 
5 in S-Plus, a widely available statistical package. 

3.1 User history 

The information set on which recommendations are based 
gives the visiting history of the user. This is: 

10 

bright chess natgal hampt science whip lego east lonaqu westab kew 
00 11 1000 0 00 

lonzoo madamt britm oxford thorpe nathist tower wind woburn 
0000 0 0000 

15 

3.2 Prior distribution over possible user profiles 
This history is used to update a prior distribution over 
possible user profiles. The first task is to specify 
the possible profiles. Each possible profile requires 

20 two numbers. In this example the possible profiles are: 





[,ll 


[,2] 




-2 


-2 


[2,] 


-2 


-1 


[3,1 


-2 


0 


[4,] 


-2 


1 


[5,] 


-2 


2 


t6,] 


-1 


-2 


C7,] 


-1 


-1 


[8,] 


-1 


0 


19,1 


-1 


1 


[10,] 


-1 


2 


[11,] 


0 


-2 


[12,] 


0 


-1 


[13,] 


0 


0 


[14,] 


0 


1 


[15,] 


0 


2 
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[16,] 


1 


-2 


[17,] 


1 


-1 


[18,] 


1 


0 


[19, ] 


1 


1 


r?o l 

L^Wf J 




z 


[21,] 


2 


-2 


[22,] 


2 


-1 


[23 J 


2 


0 


[24 J 


2 


1 


[25,] 


2 


2 



The probability of each possible profile that is assumed 
in the prior distribution is then specified. Here a 
binomial approximation is used having a sample size of 
4. (The following should be read as: the probability 
of the first profile is 0.003 9, the probability of the 
second is 0.0156, the probability of the third is 0.234 
and so on) . 



20 HI 0.00390625 0.01562500 0.02343750 0.01562500 0. 00390625 

[6] 0.01562500 0.06250000 0.09375000 0.06250000 0.01562500 

[11] 0.02343750 0.09375000 0.14062500 0.09375000 0.02343750 

[16] 0.01562500 0.06250000 0. 09375000 0.06250000 0.01562500 

[21] 0.00390625 0.01562500 0.02343750 0.01562500 0.00390625 



25 



30 



3.3 Posterior distribution over possible user profiles 
Having specified the prior distribution, the likelihood 
of each profile is updated using Bayesian updating in 
the light of the user's visiting history. In doing so 
non- visits are treated as missing data. 



CD 3.922150e-04 8.512675e-04 5.726658e-04 2 . 415706e-07 4 .340733e-l3 

[61 3.134620e-O2 6.4946ff3e-02 4.081062e-02 1.708743e-05 2.670556e-ll 

[IX] 2.021309e-01 3.85S605e-01 2.137281e-0l 8.269622e-05 1.037207e-10 

35 [16] l.S88965e-02 2.881321e-02 1.474086e-02 5.S542S9e-06 S.891024e-12 

[21) 3.3a8S85e-06 5.536305e-06 2.669398e-06 1.052816e-09 1. 057896e-15 
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3.4 Probability of a visit 

This posterior distribution over possible user profiles 
is then used to work out the likelihood of a visit to 
each attraction. The probability of a visit to 
Brighton, say, is calculated by working out, for each 
possible profile, what the probability of visiting 
Brighton is, and then weighting each of these using the 
probability that the user's profile is the relevant one. 
The result is: 

[1] 0.4120460 0.,3744B45 0.5589836 0.4939777 0.8384324 0.3434113 

[7] 0.5307790 0.1500989 0.4989128 0.2402854 0.5357991 0.7198547 

[13] 0.3845266 0.5670006 0.3378800 0.2552298 0.7929130 0.6537655 

[19] 0.3924300 0.1675236 



3.5 Make a recommendation 

The recommended attraction is that one with the highest 
probability of a visit, but which has not yet been 
visited. The attraction with the highest probability of 
20 a visit is number 5, the science museum. The user has 

already visited this, however and it is not recommended. 
The recommendation is item 17, the Natural History 
museum. The expected probability is 0.793 

2 5 Appendix B 

1.1 The set of items 

The data in the example describe visits to a number of 
London Attractions. There are 20 attractions. 

30 

1.2 Create different sets of item 

The attractions were divided into two classes, one for 
outdoor attractions and one for indoor attractions since 
it might be thought that people look for different 
35 things when visiting attractions in the different 

classes. Outdoor ones are labelled "o" and indoor ones ■ 
labelled »i". The labels, and the attraction 
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identities , are: 



BRIGHTON 


Brighton 


1 


o 


CHESS 


Chessington 


2 


o 


NATGAL 


National Gallery 


3 


i 


HAMPTON 


Hampton Court Gardens 


4 


o 


SCIENCE 


Science Museum 


5 


x 


WHIPSNDE 


Whipsnade 


6 


o 


LEGO 


Legoland 


7 


o 


EASTBORN 


Eastbourne 


8 


o 


LONAQUA 


London Aquarium 


9 


i 


WESTABBY 


Westminster Abbey 


10 


i 


KEW 


Kew Gardens 


11 


o 


LONZOO 


London Zoo 


12 


r\ 


MADTUS 


Madam Tussauds 


13 


i 


BRITMUS 


British Museum 


14 


i 


OXFORD 


Oxford 


15 


o 


THORPE 


Thorpe Park 


16 


o 


NATHIST 


Natural History Museum 17 


i 


TOWER 


Tower of London 


18 


i 


WINDSOR 


Windsor Castle 


19 


o 


WOBORN 


Woburn Wildlife Park 


20 


o 



1.3 The data set 

25 The data records attendance at each attraction for 624 
users- Each user is represented by a row in the data 
set . The first column in the row is the first 
attraction (Brighton) , the second column is the second - 
attraction (Chessington) and so on. The data records 

3 0 "1" if the user has visited the attraction in the past 4 
years, and 0 otherwise. The following gives the first 
10 records from the dataset (the full set is in an 
appendix) . As an example, this data records that the 
first user has visited Brighton and the National 

3 5 Gallery, but not Chessington. 
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Extract begins 

10 !11000liiiii 10 liio 
1111 10lliiiii 10 lliio 
01 111010011liiiii 110 

5 °°11101011liiiioiiio 

°°ioioooiiiooiooiooo 
111111111111111111 

01111101110101001 
1101111001110101100 

10 10101100001001101100 
01111000001001001110 



1 1 

110 

1 



-Extract ends- 



2.1 Derive pseudo-item profiles for each class 

15 separately 

For each class the pseudo-item profiles were derived 
using a factor analysis call in S-PLUS specifying 2 
factors. The following gives the results for the 
outdoor attractions. In this view only factor loadings 

20 that are above a minimum threshold have been shown. 



-Extract starts - 



Factorl Factor2 

bright 

25 chess 0.335 

hampt o . 342 

whip 0.180 

lego 0.136 0.177 
east 

3 0 kew 0.449 

lonzoo 0.127 0.205 
oxford 0.421 

thorpe 0.995 
wind 0.423 

35 woburn 0.118 



Extract ends 
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These factor loadings are taken as the item profiles. 
Because the loadings are standardised, there is no b 0 . 
For example the item profile for Woburn is (b lf b 2 ) = 
(0,0.118) . 

Pseudo-item profiles for the indoor attractions were 
derived in a similar way to give: 

Extract begins 

Factorl Factor2 



natgal 


0 


.286 


0.314 


science 


0 


.632 




lonaqu 


0 


.218 




westab 






0.427 


madamt 






0.295 


britm 


0 


.321 


0.439 


nathist 


0 


.500 


0.131 


tower 


0 


.132 


0.436 



Extract ends 

20 

2.2 Generate estimates of the user profiles 
For each user these factor loadings were used to 
generate an estimated user profile for each group 
separately. Component q in the profile is equal to the 
25 sum of each observation multiplied by component q in the 
relevant item profile: i.e. 

These are available automatically from S-PLUS using the 
score parameter. The following shows S-PLUS call and 
the resulting scores for the first 5 users in the 
30 database for the outdoor attractions. 
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Extract begins 

> factanal (Dom.x [1:500, air == , o'], scores= ' reg • , 
factors=2) $scores 



5 Factorl Factor2 

1 -0.6232562 -0.36748994 

2 -0.6089289 -0.44638126 

3 -0.6333564 -0.23152621 

4 -0.6208385 -0.36168293 
.0 5 -0.6822305 0.10715258 

, Extract ends- 



User profiles in respect of the indoor attractions were 
calculated in a similar manner. The total user profile 
5 combines the two. It has four components, two from the 
indoor attractions and two from the outdoor ones. 

2.1 Generate Item Profiles 

Using these estimated user profiles the item profiles 
0 were generated. A logit regression function in S-PLUS, 
glim, was called specifying the user profiles as the 
independent variables. The full set of results are 
shown below. In this table the components are listed in 
the order (1,2,3,4,0). 

5 Extract begins 

> matrix (unlist (lapply (dimnames (Dom.x) [[2]], do.in.out)), 
ncol-5) 







M3 




1,2] 


[,33 


[,43 


[,53 




-0, 


66497682 




0.06631292 


-0.94866420 


-1.65B7867149 


-0.443933558 


[2,3 


-0. 


14224857 




8.61834093 


0.84786846 


0.1258775729 


3.421769372 


[3,] 


0. 


16070782 




-1.44241195 


-0.04910719 


1.3299388583 


0.264559297 


[4,] 


0. 


05639791 


0 


.11898905 


-0.08425662 


0.2725675719 


0. 004498342 


[5,] 


0.33026646 


0.20881792 


0.26471087 


-0.0338485436 


-0.236691297 


[6,] 


-0. 


18430768 




-1.72651454 


-6.92681004 


-3.2661175617 


-1.591378576 


[7,3 


-0. 


12763604 


0 


.20989516 


-3.23738624 


2.0482587025 


0.073698981 


[BJ 


0. 


16046396 




-0.22394473 


6.31290092 


3 .5461147033 


2.690590592 


[9,3 


0. 


80989483 


0 


.06323751 


-0.37184738 


0.0014233164 


-0.002682853 
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tioj 


-0.25525493 




1.17491048 


0.62420648 


-0.6601784440 


0.371346177 


[11.] 


-1 


.63613752 




-0.08602790 


-2.00233330 


-3.3374396600 


-2.655359233 


{12,} 


1, 


.21738255 




0.03825106 


0.07490919 


-0.6161212026 


-0.819341155 


(13,) 


1. 


21257946 




-0.49036764 


-0.34287230 


0.0660361639 


0.285405279 


[14,] 


-0. 


46608714 




0.23134578 


-0.28247497 


-0.1965370782 


-0.224963948 


[15,] 


0. 


05155904 




0.95326279 


2.89985604 


2.9202511713 


2.699170241 


[16,1 


-1. 


14495536 




-2.42700804 


-0.06364561 


-4 .4877205744 


-2.755308580 


[17,] 


0. 


10751957 




-0.14*824210 


0.44152766 


-0.0002659749 


0.018338347 


[18,] 


-0. 


29253927 


0 


.30650048 


-0.05671760 


0.0001933553 


-0 .209695788 


[19,] 


-0. 


22787088 


0 


.01015998 


0.18361485 


10.6113818822 


0. 262801694 


[20,] 


1. 


55867871 


0 


.50430103 


0.93072996 


1.3554356391 


1.267106002 



Extract ends 

Appendix C 

15 

1.1 The set of items 

The data in the example describe visits to a number of 
London Attractions. There are 20 attractions. The data 
also includes an additional binary variable which 

20 records whether or not the user's children have an 
average age of 10 and above, or not (all users are 
assumed to have school age children) . These attractions 
and the child-age variable are labelled in various ways 
in what follows. The labels, and the attraction 

25 identities, are: 



BRIGHTON 


Brighton 


1 


CHESS 


Chessington 


2 


NATGAL 


National Gallery 


3 


HAMPTON 


Hampton Court Gardens 


4 


SCIENCE 


Science Museum 


5 


WHIPSNDE 


Whipsnade 


6 


LEGO 


Legoland 


7 


EASTBORN 


Eastbourne 


8 


LONAQUA 


London Aquarium 


9 


WESTABBY 


Westminster Abbey 


10 


KEW 


Kew Gardens 


11 
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LONZOO 


London Zoo 


12 


MADTUS 


Madam Tussauds 


J- -J 


BRITMUS 


British Mu.spnm 


1 4 


OXFORD 


Oxford 




THORPE 


Thorpe Park 


16 




Natural History Museum 


17 


TOWER 


Tower of London 


18 


WINDSOR 


Windsor Castle 


19 


WOBORN 


Woburn Wildlife Park 


20 


CH.IO 


Average age of child- 


21 




ren is 10 or more 





1.2 The data set 

The data records attendance at each attraction for 624 
15 users. Each user is represented by a row in the data 
set. The first column in the row is the first 
attraction (Brighton) , the second column is the second 
attraction (Chessington) and so on. The data records 
"1" if the user has visited the attraction in the past 4 
20 years, and 0 otherwise. The following gives the first 

10 records from the dataset (the full set is in Appendix 
B) . As an example, this data records that the first 
user has visited Brighton and the National Gallery, but 
not Chessington. 

25 

Extract begins 



0 


0 


1 


1 


1 


0 


0 


0 


0 


0 


0 


0 


0 


0 


0 


0 


0 


0 


0 


0 


0 


0 


1 


0 


1 


1 


0 


0 


0 


0 


0 


0 


0 


0 


0 


0 


0 


0 


0 


0 


0 


0 


1 


1 


0 


0 


0 


1 


1 


0 


0 


0 


0 


0 


0 


0 


0 


0 


0 


0 


0 


0 


0 


0 


0 


0 


0 


0 


0 


1 


0 


0 


0 


0 


1 


0 


0 


0 


0 


1 


0 


0 


0 


0 


0 


0 


0 


0 


1 


0 


1 


0 


0 


0 


1 


0 


0 


0 


0 


0 


0 


0 


0 


0 


0 


0 


0 


0 


1 


0 


0 


0 


0 


0 


0 


0 


0 


0 


1 


1 


0 


0 


0 


0 


0 


0 


0 


1 


1 


1 


1 


0 


0 


0 


0 


0 


0 


0 


0 


0 


0 


0 


0 


0 


0 


0 


1 


0 


0 


0 


0 


0 


0 


1 


0 


1 


0 


0 


0 


0 


0 


0 


0 


1 


0 


0 


0 


0 


1 


1 


0 


1 


0 


0 


0 


0 


0 


0 


0 


0 


0 


0 


0 


0 


0 


0 


0 


0 


0 


1 


0 


1 


0 


0 


0 


0 


0 


0 


1 


0 


0 


0 


0 


0 


0 


0 


0 


0 


0 


0 



•Extract ends 
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2.1 Derive pseudo-item profiles 

The pseudo-item profiles were derived using a factor 
analysis call in S-PLUS specifying 2 factors. Only the 
data on attractions, and not. on average child age, was 
used in the factor analysis . 

The following gives the resulting standardised factor 
loadings . 



10 Extract starts 

> f act anal (Dom.x [1:500,], factors=2) $load 
Loadings: 

15 Fa c tori Fact or 2 



35 



bright 








chess 




0 


.354 


natgal 


0.385 






hampt 


0.241 






science 


0.332 






whip 


0.229 






lego 




0. 


.165 


east 


0.121 






lonaqu 


0.216 






westab 


0.259 






kew 


0.377 






lonzoo 


0.237 


0. 


140 


madamt 


0.256 






britm 


0.476 






oxford 


0.369 






thorpe 




0. 


997 


nathist 


0.345 






tower 


0.425 






wind 


0.338 






woburn 


0.191 


0. 


129 



Extract ends 



WO 02/10954 



PCT/GB01/03383 



- 175 - 

These factor loadings are taken as the item profiles. 
Because the loadings are standardised, there is no b 0 . 
For example the item profile for Woburn is (b lf b 2 ) = 
(0.191, 0.129). 

2.2 Generate estimates of the user profiles 
For each user these factor loadings were used to 
generate an estimated user profile for each group 
separately. Component q in the profile is equal to the 
sum of each observation multiplied by component q in the 
relevant item -profile: i,e. 

% - £ h/ b ; 

J 

These are available automatically from S-PLUS using the 
score parameter. The following shows S-PLUS call and 
the resulting scores for the first 5 users in the 
database for the outdoor attractions. 



Extract begins- 
> factanal(Dom.x [1:500, ], scores= f reg ■ , 
factors=2)$scores[l:5, ] 

20 







Factorl 




Factor2 


1 


-0. 


1661745 


-0. 


6675610 


2 


-0. 


6143931 


-0. 


6655715 


3 


-0. 


7493019 


-0. 


6639595 


4 


-0. 


5263396 


-0. 


6660611 


5 


-0. 


3366707 


-0. 


6651219 



•Extract ends 



2 . 3 Generate Item Profiles 

Using these estimated user profiles the item profiles 
were generated. A logit regression function in S-PLUS, 
glim, was called specifying the user profiles as two of 
the independent variables. Average child age was also 
specified as a third independent variable. This means 
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that the logit regressions yield 4 parameter estimates 
each. One is the constant terms b 0 .. Two relate the user 
profile derived via the pseudo-item profiles of the 
attractions, and one relates to the average child age 
variable. The full results are: 



Extract begins 

[1,1 0.2461899 0.08957790 0.025417992 -0.66819314 
12,} -0.3047198 0.72615861 1.150155164 -0.51824073 
[3,1 1.5229507 -0.45950123 0.446952740 -1.89215801 
[4,] 0.8353290 0.02789901 -0.467996396 -0.92878458 
[5,] 1.5013147 0.19678912 -0.042031655 0.07848287 
[6,] 0.7973976 0.23770797-0.238861189-1.59388460 
[7,] 0.2470988 0.38253475 -0.592481225 0.08158206 
[8,] 0.5837931 0.12096454 -0.769423312 -2.24451270 
[9,] 0.7443689 0.01839470 -0.494524151 -0.78180470 
[10,] 1.0643638 -0.32004482 -0.010331299 -2.69010465 
[11,] 1.4131604 0.12360087 -0.185885413 -1.56747270 
[12,] 0.9490218 0.38215384 -0.782284912 0.16017343 
[13,] 0.8383658 0.16192526 0.852735719-1.87539562 
[14,] 2.0868181-0.12670931 0.403985870-2.46859509 
[15,] 1.4829560 0.18784714 -0.563594639 -2.49006514 
[16,] -0.0946940 10.69750731 -0.004585096 -4.48642779 
[17,] 1.4456744 0.12339996 0.002653749 -0.25213316 
[18,] 1.7506924-0.12216716 0.843728615-1.72089561 
[19,] 1.2426287 0.09639704 -0.113571691 -2.04350959 
[20,] 0.7927236 0.44133683 -0.391512108 -2.53944885 

Extract ends 
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Appendix D 



User histories 
> hl.20 



b 




r ii 
W 


[>2] 


[>3] 


[A] 


[>5] 








0 


1 


0 


0 




[2J 




0 


0 


0 


0 




[3J 




0 


1 


0 


0 




Til 1 




1 


1 


0 


0 


1U 


[5J 


1 / 


0 
0 


1 
1 


0 
0 


0 

1 




[7J 


0 


0 


1 


0 


1 




ro t 

[S 5 ] 


0 


1 


1 


0 


1 




PJ 


0 


1 


1 


1 


1 


15 


[10,] 


0 


1 


1 


0 


1 




[11,] 




1 




0 


0 




[12,] 




0 




0 


0 




[13,] 




1 




0 


0 




[14,] 




1 




0 


0 


20 


[15,] 




0 




0 


0 




[16,] 




0 


0 


1 






[17,] 




0 


0 


1 






[18,] 




0 


0. 


0 






[19,] 




0 


0 


1 




25 


[20,] 




0 


1 


1 





Further examples are described below: 
Example 1 

3 0 > ex.1 _ ab(hL20 3 tol=0.01 3 lambda= 5, mu=0.75) 
Predicted user histories 



> H(ex.l$a.primej ex.l$b.prime) 







U] 


L2] 


[>3] 


Ml 


L5] 


35 


[1J 


1 


0 


1 


0 


0 




[2,] 


0 


0 


0 


0 


0 




[3J 


1 


0 


1 


0 


0 
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5 



15 



[4J 


1 1 




0 


o 


PJ 


1 0 


J 


0 


o 


[6J 


1 1 


J 


0 


1 


PJ 


1 0 




o 


o 


[8,] 


1 0 


x 


o 


o 


PJ 


1 0 


x 


1 


1 

I 


[10,] 


1 0 


x 


o 


n 


[11 J 


1 1 


x 


o 


D 


[12,] 


0 0 




o 


A 

u 


[13J 


1 1 


x 


o 


o 


[14,] 


1 I 




o 


n 


[15,] 


1 0 


J 


o 


A 

u 


[16,] 


1 0 


0 


1 




[17J 


1 0 


0 


1 




[18J 


1 0 


0 


0 




[19J 


1 0 


0 


1 




[20J 


1 0 


1 


1 





Prediction errors 

20 

> sum(H(ex.l$a.prime, ex.ljb.prime) == 1 & hl.20 == 0) 
[1]5 



> sum(H(ex.lJa.prirne 3 ex.l$b.prirne) == 0 & hl.20 == 1) 
25 [1] 9 



Normalised log-likelihood 

> ex.ljnorm.log.lik 
30 [1]—0.3921817 



35 



Likelihood of the user histories 

> Phi(hl.20, ex.lSa.prime, ex. 1 $b.prime) 

[>!] t2J 13] 

[1J 0.8250856 0.5240304 0.8350231 
[2,] 0.4134032 0.7579803 0.5907615 
[3 J 0.8250856 0.5240304 0.8350231 



0.8807971 
0.8716424 
0.8807971 



L5] 
0.7421196 
0.8161381 
0.7421196 
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r/i l 


U.o /3 / 17Z 






P,J 


U.o25UoD0 


0.5240304 






0.9347387 


0.4743499 




I' J 


A a^ o ao vl 

U. 3938034 


A T1CO l ai 

0.7258131 


c 
D 


ro i 


n o i i coon 


U. 4070007 






U.1343oy7 


A 1A£AOA£ 

0.2969896 




fin 1 


n oi icooo 
U.<£1 uooo 


U.4U /OOO / 




mi 


U.O /3/ 1 /2 


0. 5256501 




U 2 ,J 


U.4134U:52 


0. /579B03 


1 U 


113,] 


U.O /3/ 1 /2 


A (TOC^CAI 

0.525O5U1 






U.CW371/Z f 


U.DZ5o5Ul 




[15,] 


0.8250857 


0.5240304 




[16,] 


0.7457234 


0.8312700 




[17,] 


0.7457234 


0.8312700 


lb 


[18,] 


0.6643145 


0.7610495 




[19J 


0.7457234 


0.8312700 




[20J 


0.9758719 


0.5418934 




Parameter values — user profiles 


~i A 

20 


> 


ex.l$a.prime 








U] 






[U 


0.9054134 


0.000000000 




[2,] 


0.4082206 


0.021110260 




[3,] 


0.9054134 


0.000000000 


25 


[4,] 


1.0000000 


0.005197485 




[5J 


0.9054134 


0.000000000 




[6J 


1.0000000 


0.318854833 




[7,1 


0.4881923 


0.222677935 




[8J 


0.7722939 


0.123414736 


*3 a 


19,] 


0.5413661 


0.749776003 




[10,] 


0.7722940 


0.123414730 




[11 J 


1.0000000 


0.005197531 




[12,] 


0.4082206 


0.021110260 




[13,] 


1.0000000 


0.005197486 


35 


[14,] 


1.0000000 


0.005197531 




[15,] 


0.9054135 


0.000000000 




[16J 


0.1927744 


1.000000000 
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0.8807972 


A 070CA£A 

0.8785969 


0.7186375 


0.8350231 


0.8807971 


0.7421 196 


A O O AO AO 1 

0.8808021 


0. 6736149 


0.5785726 


0.4882028 


0.7519964 


0.3541521 


0.7482299 


0.8185183 


0.3313691 


0.5412996 


0.7308824 


0.8267741 


0.7482300 


0.8185183 


0.3313691 


0.8807972 


0.8785969 


0.7186374 


0.5907615 


0. 8716424 


0.8161381 


0.8807972 


0.8785969 


0.7186375 


0.8807972 


0.8785969 


0.7186374 


0.8350231 


0.8807971 


0.7421196 


0.7736004 


0.8807971 


0.9003190 


0.7736004 


0.8807971 


0.9003190 


0.5984503 


0.5202947 


0.5831247 


0.7736004 


0.8807971 


0.9003190 


0.8153668 


0.8738971 


0.9449713 
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[17,] 0.1927744 

[18,] 0.4002291 

[19,] 0.1927745 

[20J 0.8712802 



1.000000000 
0.479694159 
1.000000000 
0.983966045 



Parameter values - object profiles 



10 



[1J 
[2J 
[3,] 
I4J 
[5J 



ex.lSb.prime 

0.9805440 
0.5256726 
1.0000000 
0.0000000 
0.2603743 



[,2] 

0.5799592265 
0.0000000000 
0.0000371357 
1.0000000000 
1.0000000000 



1 5 Recommendation for user with current history c(0, 1 , 1 ,0,0) 
Calculate user profile 

> a.only (€(0,1,1,0,0), ex.l$h.prime)$a.prime 
[1] 0.6601747 0.0000000 



2 0 Make recommendation 

> R(c(0,l,l,0,0), a.only(c(0,l, 1,0,0), ex. lflb. prime) $a.prime, ex. lftb.prime) $recommend 
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Example 2 

> ex.2 _ ab(h!.20, tol^O.Ol, lambda=.5 5 mu=0.75) 
5 Predicted user histories 

> H(ex.2$a.prime, ex. 2$h. prime) 

Ml 12) 13] [,4] [,5] 





[1,] 


1 0 


1 


0 


0 


10 


[2,] 


0 0 


0 


0 


0 




PJ 


1 0 


1 


0 


0 




[4J 


1 1 


,1 


0 


0 




[5,] 


1 0 


*1 


0 


0 




[6 3 ] 


1 1 




0 


1 


15 


[7J 


1 0 




0 


0 




[8J 


1 0 




0 


0 




[9,] 


1 0 




1 


1 




[10J 


1 0 




0 


0 




["J 


1 1 




0 


0 


20 


[12J 


0 0 




0 


0 




[13,] 


1 1 




0 


0 




[HJ 


1 1 




0 


0 




[15J 


1 0 




0 


0 




[16J 


1 0 


0 


1 


1 


25 


[17J 


1 0 


0 


1 


1 




[18,] 


1 0 


0 


0 


1 




[19,] 


1 0 


0 


1 


1 




[20J 


1 0 


1 


1 


1 



3 0 Prediction errors 



> sum(H(ex.2$a.prime 3 ex.2$b.prime) == 1 &hl.20 == 0) 
[1]6 

35 > sum(H(ex.2$a.prime, ex.2$b .prime) == 0 & hi .20 == 1) 

[1]6 

Normalised log-likelihood 

40 > ex.23norm.log.lik 
[1] —0.4064687 

Likelihood of the user histories 
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> Phi(hl.20, ex.2$a.prime, ex.2Sb.prime) 



5 



10 



15 



20 





Li] 


12] 


L3] 


L4] 


15) 


[1,] 


0.6340171 


0.6228777 


0.5417132 


0.7324477 


0.5088954 


PJ 


0.4419658 


0.8807971 


0.7884062 


0.7221042 


0.5996140 


[3J 


0.6340171 


0.6228777 


0.5417132 


0.7324477 


0.5088954 


PJ 


0.6268344 


0.8751649 


0.8892529 


0.8661554 


0.6496016 


PJ 


0.6340171 


0.6228777 


0.5417132 


0.7324477 


0.5088954 


[6J 


0.9338098 


0.6756966 


0.6893552 


0.4223050 


0.8711992 


[7J 


0.4327887 


0.6330654 


0.5061991 


0.7608085 


0.4309982 


[8 3 ] 


0.4259915 


0.8754822 


0.8807971 


0.8806682 


0.3063822 


PJ 


0.2070898 


0.8175949 


0.8859810 


0.2268360 


0.5567961 


[10,] 


0.4259915 


0.8754822 


0.8807971 


0.8806682 


0.3063822 


I" J 


0.6268344 


0.8751649 


0.8892529 


0.8661554 


0.6496016 


[12,] 


0.4419658 


0.8807971 


0.7884062 


0.7221042 


0.5996140 


[13,] 


0.6268344 


0.8751649 


0.8892529 


0.8661554 


0.6496016 


[14,] 


0.6268344 


0.8751649 


0.8892529 


0.8661554 


0.6496016 


[15,] 


0.6340171 


0.6228777 


0.5417132 


0.7324477 


0.5088954 


[16,] 


0.8807971 


0.8807971 


0.6106311 


0.5904962 


0.8339121 


[17,] 


0.8807971 


0.8807971 


0.6106311 


0.5904962 


0.8339121 


[18,] 


0.8213265 


0.8807971 


0.6533716 


0.4786965 


0.7658134 


[19,] 


0.8807971 


0.8807971 


0.6106311 


0.5904962 


0.8339121 


[20J 


0.9414221 


0.6602454 


0.7114509 


0.5905965 


0.8822130 



2 5 Parameter values — user profiles 



> ex.28a.prime 





Li] 


L2] 


[1J 


0.41946343 


0.3792647 


PJ 


0.44170302 


0.0000000 


PJ 


0.41946343 


0.3792647 


[4,] 


0.05553167 


0.9992640 


PJ 


0.41946344 


0.3792647 


[6J 


0.97756065 


0.3204635 


PJ 


0.35605448 


0.3682253 


PJ 


0.00000000 


1.0000000 


PJ 


0.32656108 


0.8860375 


[10J 


0.00000000 


1.0000000 


P« 


0.05553167 


0.9992641 


(12J 


0.44170302 


0.0000000 


[13,] 


0.05553167 


0.9992640 


[14,] 


0.05553167 


0.9992641 


[15,] 


0.41946344 


0.3792647 


[16,] 


1.00000000 


0.0000000 
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[17 J 1.00000000 0.0000000 

[18J 0.88134012 0.0000000 

[19,] 1.00000000 0.0000000 

[20,] 1.00000000 0.3381018 

5 

Parameter values — object profiles 
> ex.2$b.prime 



[.1] 12] 
10 * [1,] 1.0000000 0.5745561760 

[2,] 0.0000000 0.9875815278 

[3,] 0.3875086 1.0000000000 

[4,] 0.5915042, 0.0003067603 

[5J 0.9034027 0.2957280299 

15 

Recommendation for user with current history c(0, 1,1,0,0) 
Calculate user profile 

> a.only(c(0,l,l,0,0), ex.28b.prime)$a.prime 
20 [1] 0.0000000 0.8741234 

Make recommendation 

> R(c(0,l,l 3 0,0) , a.only(c(0,l,l,0,0) , 

2 5 ex.2$b.prime)$a.prime,ex.2$b.prime)$recommend 

[1]1 

Example 3 

3 0 > ex.3 _ ab(hl.20, tol=0.01 3 lambda= 5, mu=0.75) 

Predicted user histories 



> H(ex.3$a.prime a ex.3$h.prime) 



35 





Li] 


L2] 


[>3] 


W 


15) 


[h] 




0 


1 


0 


0 


RJ 




0 


0 


0 


0 


[3 3 ] 




0 


1 


0 


0 


[4J 




0 


1 


0 


0 


[5J 




0 


1 


0 


0 


[6J 




0 


1 


0 


1 


[7,1 




0 


0 


0 


1 


[8J 




0 


1 


0 


1 


[9J 




0 


1 


1 


1 
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[10,] 


1 


0 


1 


0 


[11 J 


1 


0 


1 


0 


[12,] 


0 


0 


0 


0 


[13,] 


1 


0 


1 


0 


[14,] 


1 


0 


1 


0 


[15,] 


1 


0 


1 


0 


[16,] 


1 


0 


0 


1 


[17,] 


1 


0 


0 


1 


[18,] 


1 


0 


0 


0 


[19,] 


1 


0 


0 


1 


[20,] 


1 


0 


1 


1 



Prediction errors 

15 > sum(H(ex.3$a.prime, ex.3$b.prime) == 1 &hl.20 == 0) 
[1]4 

> sum(H(ex.3$a.prime, ex.3$b.prime) == 0 & hi .20 == 1) 
[1] 10 

20 

Normalised log-likelihood 

> ex.3$norm.Iog.lik 
[1] —0.3932814 

25 

Likelihood of the user histories 



> Phi(hl.20, ex.3$a.prime 3 ex.3$b.prime) 
[>1] L2] 


r>3] 


[1J 


0.8807971 


0.5512987 


0.8806447 


[2J 


0.4578040 


0.7647398 


0.5423608 


P,] 


0.8807971 


0.5512987 


0.8806447 


[4,] 


0.8809262 


0.4487512 


0.8806558 


[5,] 


0.8807971 


0.5512987 


0.8806447 


[6J 


0.9078677 


0.5395961 


0.8832197 


[7,1 


0.4803071 


0.7609348 


0.4472996 


[8,] 


0.3198346 


0.2954913 


0.6031322 


[9,] 


0.3116478 


0.2798293 


0.5390089 


[10,] 


0.3198346 


0.2954913 


0.6031322 


[11 J 


0.8809262 


0.4487512 


0.8806558 


[12,] 


0.4578040 


0.7647398 


0.5423608 


[13,] 


0.8809262 


0.4487512 


0.8806558 


[14J 


0.8809262 


0.4487512 


0.8806S58 


[15,] 


0.8807971 


0.5512987 


0.8806447 
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1 
0 
0 
0 
0 
0 



L4] 


[>5] 


0.8807971 


0.8134237 


0.8807971 


0.8530244 


0.8807971 


0.8134237 


0.8801523 


0.8123465 


0.8807971 


0.8134237 


0.6380087 


0.5459605 


0.6039016 


0.5141825 


0.5435446 


0.6046766 


0.8115911 


0.9069239 


0.5435446 


0.6046766 


0.8801523 


0.8123465 


0.8807971 


0.8530244 


0.8801523 


0.8123465 


0.8801523 


0.8123465 


0.8807971 


0.8134237 
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[16,] 0.5377219 0.7733681 

[17,] 0.5377219 0.7733681 

[18,] 0.5385306 0.7554185 

[19,] 0.5377219 0.7733681 

5 [20,] 0.9275260 0.5379658 



Parameter values — user profiles 



185 - 

0.6146786 0.7964475 0.8892863 

0.6146786 0.7964475 0.8892863 

0.5370044 0.5877765 0.5355289 

0.6146786 0.7964475 0.8892863 

0.8731563 0.7973894 0.9173102 



> ex.3?a.prime 



10 



15 





t« 


12] 


[1J 


1.0000000 


0.000000000 


[2J 


0.4577034 


0.000000000 


[3,] 


1.0000000 


0.000000000 


[4J 


1.0000000 


0.001770631 


PJ 


1.0000000 


0.000000000 


[6J 


1.0000000 


0.414193699 


[7J 


0.4404549 


0.456091660 


[83] 


0.5969758 


0.527508093 


[9J 


0.5243517 


1.000000000 


[10,] 


0.5969757 


0.527508094 


[11 J 


1.0000000 


0.001770621 


[12,] 


0.4577034 


0.000000000 


[13J 


1.0000000 


0.001770642 


[14,] 


1.0000000 


0.001770642 


[15,] 


1.0000000 


0.000000000 


[16,] 


0.3688663 


0.972215602 


[17,] 


0.3688663 


0.972215605 


[18,] 


0.4559963 


0.475444315 


[19,] 


0.3688663 


0.972215599 


[20,] 


0.9681038 


0.973897501 



Parameter values — object profiles 



> ex.3$b.prime 



35 



40 



[1J 
PJ 
[3,] 
[4J 
[5,1 



LI] 

1.0000000 
0.448S201 
0.9996374 
0.0000000 
0.1318970 



12] 

0.17375507 
0.02849059 
0.01492679 
0.86509546 
1.00000000 



Recommendation for user with current history c(0,l,l,0,0) 
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Calculate user profile 

> a.only(c(0,l, 1,0,0), ex.3$b.prime)Sa.prime [1] 0.6501714 0.0000000 

5 Make recommendation 

> R(c (0,1,1,0,0), a.only(c (0,1,1,0,0), ex.3Jb.prime)$a.prime,ex.3Sb.prime)$recommend [1] 
1 

Appendix E 

10 

S-PLUS functions 

Iterative procedure to find a and b 5 user and object profiles to maximise user 
histories h. Take repeated steps of updating first the user profiles then the 
15 object profiles until the improvement in the normalised log-likelihood is less 
than specified tolerance (argument tol) . (User and object profiles are vectors 
of length r.) 

> ab 

20 functional, tol = 0.1, lambda = 1, mu = 1, r = 2, a = NULL, b = NULL) 

n <- nrow(h) 
p < — ncol(h) 
i a — rprof(n, 2) 

25 b< — rprof(p 3 2) 

zz < — ab.min.log.Phi(h, a, b) 

rho < — zz$norm.log.lik[2]/zz$norm.log.lik[a] 
its <— 1 

while(rho < 1 — tol && its < 10) 
30 zz < — ab.min.log.Phi(h a zz$a.prime 3 zzflb.prime, lambda, mu) 

rho <— zz$norm.log.lik[2J/zz$norm.log.lik[l] 
its< — its+ 1 

obj <— list (a a, b = b, a.prime = zz$a.prime 3 b.prime = zz$b.prime 5 
3 5 norm.log.lik = zz$norm.log.lik[2 

], iterations = its) 
attr(obj 3 "call) <— match.callQ 
obj 

} 

40 

Two — step process to maximise log— likelihood of user histories h, first by 
holding b fixed and maximising over user profiles a, then maximising over 
object profiles b with updated user profiles a.prime. The second step 
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generates updated object profiles b.prime. For both user and object profiles, 
the updated profile is a linear combination of the initial profile and the profile 
generated by the optimisation procedure. (Arguments lambda and mu control 
the linear combinations.) Each optimisation step is carried out by the S- 
5 PLUS built-in function nlminb. 



> ab.min.log.Phi 

function(h 5 a, b 5 lambda = 1 3 mu = 1) 

{ 

10 n <- nrow(a) 

a.prime <- matrix(NA, nrow = nrow(a), ncol = ncol(a)) 

a. mess < — character(n) 
for(i in l:n>( 

zz <— nlminb(start = a[i, ], functional, hi., b) 
15 — sum(log.Phi.i. (hi., u 3 b)), lower = 0, upper = 1, hi. = h[i, ], b = b) 

a.prime [i 3 ] < — lambda * zz$parameters + (1 — lambda) *a[i, 

a. mess [i] < — zz$mess 

} 

2 0 m <- nrow(b) 

b. prime <- matrix(NA, nrow = nrow(b), ncol = ncol(b)) 
b.mess < — character (n) 

for(j in l:m) 

zz <— nlminb (start = b[j, ], function(u, h.j 3 a) 
25 — sum(log.Phi..j(h.j, a, u)), lower = 0, upper = 1, h.j = h[, j], a = a. 

prime) 

b. prime[j, ] <— mu * zz$parameters + (1 — mu) *b[j, 
b.mess[j] < — zz$mess 

} 

3 0 log.lik <— log.Phi(h, a, b) 

log.lik.prime <— log.Phi(h, a.prime, b.prime) 

list(a = cbind(a, a.prime), b = cbind(b, b.prime), norm.log.lik = 
c(sum(log.lik), sum(log.lik.primel)/( 

m * n), log.lik = cbind(log.lik 3 log.lik.prime), messages = 

3 5 c(a.mess, b.mess), a.prime = 

a.prime, b.prime = b.prime) 

} 
> 

4 0 Log— likelihood of user profile ai given user history ai and object profiles 

b. 



> log.Phi.i. 
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fiinctionChij ai, b) 
{ 

p <- nrow(b) 

log.lik < — numeric(p) 
5 for(j in l:p) 

logJikD] <— Iog.Phi.ij(hi[j], ai, b[j, ]) 

} 

log. lik 

} 

10 

Log— likelihood of object profile bj given user histories h. j for object j and user 
profiles a. 

> log.Phi. . j 

15 function(h.j, a, bj) 

{ 

p <- nrow(a) 

log.lik < — numeric (p) 

for(i in l:pl { 

20 log.lik[i] <— log.Phi.ij(h.j[i], a [i, ], bj) 

} 

log. lik 

} 

2 5 Log-likelihood of hij given user profile ai and object profile bj. 

> log.Phi.ij 
function(hij 3 ai, bj) 

{ 

30 log(Phi.ij(bij 3 ai, bj)I 

} 

Likelihood of hij given user profile ai and object profile bj. 

35 >Phi.ij 

fiinction(hij 5 ai 3 bj) 

{ 

ifelse(hij == 0, 1 — phi(sum(ai *bjl), phi(sum(ai *bj))) 

} 

40 

Score function 
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>phi 

function(t 3 lambda = 4) 

{ 

1/(1 + exp( — lambda * (t — 0.5))) 

} 

Generate random profiles 

> rprof 
function(n 3 p) 
{ 

# uniformly distributed in positive quadrant of unit disk ?? matrix(runif(n *pl 5 
nrow = n) * 

} 

Generate predicted user histories 
>H 

function(a 3 b) 

{ 

n < — nrow(a) 
p <- nrow(b) 

zz < — matrix (NA, nrow = n 3 ncol = pi 
for(i in l:n) 

for(j in l:p) 

zz[i, j] <~ phi(sum(a[i 3 ] *b[j 3 2)) 

} 

} 

ifelse(zz < 0.5, 0, 1) 

} 

Calculate user profile for a new user with history h given object profiles b 

> a. only 
function(h 3 b) 

{ 

p <- nrow(bI 
r <- ncol(b) 
a < — rprof(l 3 r) 

zz <— nlminb(start = a 3 fiinction(u 3 hO, b) 
— sum(log.Phi.i. (h0 3 u 3 bll 3 lower = 0 3 upper = 1 3 hO = h 3 b = hi a.prime 
< — zzftparameters 
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log.lik <— log.Phi(h, a.prime, b) 

obj <— list(a = a, a.prime = a.prime, norm.log.lik = sum(log.lik)/p 3 
messages = zzSmessage) 

attr(obj, "call") 1 <- match.call 0 
5 obj 

} 

Make a recommendation for a user with history h given user profile a and object 
profiles b by choosing object not yet sampled with largest score 

>R 

function (tr, a, b) 

{ 

if(all(h== 1)) 

stop("'e's been everywhere already! !) 
p <- nrow(b) 
if (length (h) !=pl 

stop("h and p out of whack!') 
score <- numeric (p) 
for (i in l:p) { 

scorefi] <-phi (sum(a *b[i,])) 

} 

rho < — rev(order(scorel) 
i <— 1 

while(h[rho[i]] == 1) { 
i<— i + 1 

} 

list (score = score, order = rho 3 recommend = rho[i]) 

Appendix F 

S-PLUS session log 

35 Complete session log of calculations for example 1 in file examples2.doc. 
Initial values for the user and object profiles are chosen at random, several 
two-stage optimisation steps are made and results are printed out. 

40 > ex.1 _ ab(hl.20, tol=0.01, lambda= 5 5 mu=0.75) 
> H(ex.l$a.prime, ex.l$b. prime) 



10 



15 



20 



25 



WO 02/10954 



PCT/GB01/03383 



191 - 



10 



15 



20 



25 



30 



35 



40 



[1J 
[2J 
[3J 
[4,] 

PJ 
[6J 
[7,] 
[8,] 
[9J 
[10,] 

[11 J 
[12,] 
[13,] 
[14,] 
[15,3 
[16,] 
[17,] 
[18,] 
[19,] 
[20,] 
Swm(H 



Ml 
1 



[,2] 
0 
0 
0 

1 

0 

1 

0 
0 
0 
0 

1 

0 

1 
1 

0 
0 
0 
0 
0 
0 



[,3] 

1 



0 
0 
0 
0 

1 



[,4] 

0 

0 

0 

0 

0 

0 

0 

0 

1 

0 
0 
0 
0 
0 
0 

1 
1 

0 

1 
1 



[,5] 

0 

0 

0 

0 

0 

1 

0 
0 

1 

0 
0 
0 
0 
0 
0 



ex.lfta.prime, ex.l$b.prime) == 1 &hl.20 == 0) [1] 5 

> sumCHCex.lSa.prime, ex. l$b. prime) == 0 & hi. 20 == 1) [1] 9 

> ex.l$norm.log.lik 
[1] —0.3921817 

> Phi.ij 

function(hij, ai, bj) 

{ 

ifelse(hij == 0, 1 - phi(sum(ai * bj)), pbi(sum(ai * bj))) 

} 

>Phi 

function (h, a, b) 

{ 

n <- nrow (h) 
p <- ncol (h) 

likelihood < - matrix (NA 3 nrow = n, ncol = p) 
for(Iinl:n) { 

for(j in l:p) { 

likelihood[i,j] <- Phi.ij (h[i,j],a[i, ] 5 bD, ]) 

} 

} 

likelihood 

} 

> Phi(hl.20, ex.lSa.prime, ex.l$b.prime) 
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[,1] 




[1,] 








0.8350231 




[2,] 


0.4134032 


5 


[3,] 


0.8250856 




[4,] 


0.8737172 




[5,] 


0.8250856 




[6J 


0.9347387 




[7,] 


0.3938034 


10 


[8,] 


0.2115889 




[9,] 


0.1343897 




[10J 


0.2115888 




[11 J 


0.8737172 




[12,] 


0.4134032 


15 


[13,] 


0.8737172 




[14,] 


0.8737172 






0 8250857 




[16,] 


0.7457234 




[17,] 


0.7457234 


20 


[18,] 


0.6643145 




[19,] 


0.7457234 




[20,] 


0.9758719 




> ex.l$a.prime 


25 




[,1] 




[1J 


0.9054134 




[2,] 


0.4082206 




[3,] 


0.9054134 




[4,] 


1.0000000 


30 


[5,] 


0.9054134 




16,1 


1.0000000 




[7,] 


0.4881923 




[8,] 


0.7722939 




[9,] 


0.5413661 


35 


[10,] 


0.7722940 




[11,] 


1.0000000 




[12,] 


0.4082206 




[13,] 


1.0000000 




[14,] 


1.0000000 


40 


[15,] 


0.9054135 




[16,] 


0.1927744 




[17,] 


0.1927744 




[18,] 


0.4002291 




[19,] 


0.1927745 



[,2J 
0.8250856 

0.7579803 

0.5240304 

0.5256501 

0.5240304 

0.4743499 

0.7258131 

0.4070667 

0.2969896 

0.4070667 

0.5256501 

0.7579803 

0.5256501 

0.5256501 

0.5240304 

0.8312700 

0.8312700 

0.7610495 

0.8312700 

0.5418934 
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[,3] 

0.8807971 
0.5907615 
0.8350231 
0.8807972 
0.8350231 
0.8808021 
0.4882028 
0.7482299 
0.5412996 
0.7482300 
0.8807972 
0.5907615 
0.8807972 
0.8807972 
0.8350231 
0.7736004 
0.7736004 
0.5984503 
0.7736004 
0.8153668 



[,4] 
0.5240304 

0.8716424 

0.8807971 

0.8785969 

0.8807971 

0.6736149 

0.7519964 

0.8185183 

0.7308824 

0.8185183 

0.8785969 

0.8716424 

0.8785969 

0.8785969 

0.8807971 

0.8807971 

0.8807971 

0.5202947 

0.8807971 

0.8738971 
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[,5] 

0.7421196 

0.8161381 

0.7421196 

0.7186375 

0.7421196 

0.5785726 

0.3541521 

0.3313691 

0.8267741 

0.3313691 

0.7186374 

0.8161381 

0.7186375 

0.7186374 

0.7421196 

0.9003190 

0.9003190 

0.5831247 

0.9003190 

0.9449713 



[,2] 

0.000000000 

0.021110260 

0.000000000 

0.005197485 

0.000000000 

0.318854833 

0.222677935 

0.123414736 

0.749776003 

0.123414730 

0.005197531 

0.021110260 

0.005197486 

0.005197531 

0.000000000 

1.000000000 

1.000000000 

0.479694159 

1.000000000 
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10 



15 



[20,] 0.8712802 

> ex.l$b.prime 
NULL 

> ex.l$b.prime 

tl] 

[1,] 0.9805440 
[2,] 0.5256726 
[3,] 1.0000000 
[4,] 0.0000000 
[5,] 0.2603743 
> 

> a.only(c(0, 1,1,0,0) 

11) • 
[1,] 0.7904475 
0.1942631 



0.983966045 



[>2] 

0.5799592265 
0.0000000000 
0.0000371357 
1.0000000000 
1.0000000000 

, ex.l$b.primel 
[>2] 



$a . prime: 
[1] 0.6601747 
20 0.0000000 



Snorm. log. lik: 
[1] —0.5728617 

25 $messages: 

[1] "RELATIVE FUNCTION CONVERGENCE" 

attr(, "call"): 

a.onlyfli = c(0, 1, 1, 0, 0), b = ex.l$b. prime) 
30 > R(c(0, 1,1,0,0), a.only(c(0,l,l,0,0), ex.l$b.prime)«a.prime, ex.l$b.prime) 
$ score: 

[1] 0.6432096 0.3516359 0.6549116 0.1192029 0.2120806 



Sorder: 
35 [1]3 1 25 4 
Srecommend: 
[1] 1 
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Appendix G 

This is an example of a numerical implementation of a preferred method of the 
invention using user information, implemented using the alternative preferred 
5 method based on tetrachoric correlations. 

1 . Specify the data 

1 . 1 The set of items 

The data in the example describe visits to a number of London Attractions. There are 20 
attractions. The data also includes an additional binary variable which records whether or not 
0 the user's children have an average age of 10 and above 3 or not (all users are assumed to have 

school age children). These attractions and the child-age variable are labelled in various ways 
in what follows. The labels, and the attraction identities, are: 



BRIGHTON 


Brighton 


1 


CHESS 


Chessington 


2 


NAT GAL 


National Gallery 


3 


HAMPTON 


Hampton Court Gardens 


4 


SCIENCE 


Science Museum 


5 


WHIPSNDE 


Whipsnade 


6 


LEGO 


Legoland 


7 


EASTBORN 


Eastbourne 


8 


LONAQUA 


London Aquarium 


9 


WESTABBY 


Westminster Abbey 


10 


KEW 


Kew Gardens 


11 


LONZOO 


London Zoo 


12 


MADTUS 


Madam Tussauds 


13 


BRITMUS 


British Museum 


14 


OXFORD 


Oxford 


15 


THORPE 


Thorpe Park 


16 


NATHIST 


Natural History Museum 


17 


TOWER 


Tower of London 


18 


WINDSOR 


Windsor Castle 


19 


WOBORN 


Woburn Wildlife Park 


20 


CH.10 


Average age of child- 
ren is 10 or more 


21 



35 

1.2 The data sot 

The data records attendance at each attraction for 624 users. Each user is represented by a 
row in the data set The first column in the row is the first attraction (Brighton), the second 
column is the second attraction (Chessington) and so on. The data records "1" if the user has 
40 visited the attraction in the past 4 years, and 0 otherwise. The following gives the first 10 

records from the dataset (the full set is in an appendix). The final column records whether or 
not the average child age in the family is above 10. 
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2. Generate the tetrachoric correlations 



The tetrachoric correlations were calculated using the PRELIS, which is distributed with 
LISREL, a widely available statistical package. Following is a printout of the output file. The 
figures should be read from left to right and give only the lower left triangle of the correlation 
5 matrix. For example the first number is the tetrachoric correlation between items (1,1), ie 

between Brighton and Brighton, and so is 1 by definition. The second figure is the tetrachoric 
correlation between the second items (2,1), ie between Chessington and Brighton. The third 
figure is for items (2,2), and so on. The pattern is built up as: 

1 st (1,1) 

10 2 nd and 3* (2,1) (2,2) 

4 th , 5 th and 6 th (3,1) (3,2) (3,3)... 

Printout starts 



0.10000D+01 0.25921D-01 0.10000D+01 0.15903D+00 -0.95292D-02 
15 0.10000D+01 

0.24066D+00 0.84937D-01 0.28213D+00 0.10000D+01 0.39210D-01 - 
0.90012D-01 

0.38216D+00 0.23000D+00 0.10000D+01 0.21047D-02 0.31598D-01 
O.14340D+0O 

20 0.44819D-01 0.90452D-01 0.10000D+01 -0.10435D+00 0.32529D-01 - 
0.11937D+00 

0.34243D-01 0.91822D-01 0.12105D+00 0.10000D+01 0.16561D+00 
0.76582D-01 

0.85915D-01 0.44421D-02 -0.23282D-O1 0.16856D+00 -0.23900D+00 
25 0.10000D+01 

0.93920D-02 -0.10186D+00 0.64973D-O1 -0.16571D-01 0.20816D+00 
0.47231D-01 

0.17422D+00 -0.92999D-01 0.10000D+01 0.77810D-01 -O.3184OD-01 
0.36910D+00 

30 0.14890D+00 -0.12013D-01 -0.23573D-01 -0.83981D-01 0.24296D+00 
0.10375D+00 

0.10000D+01 -0.950B4D-02 0.11492D-01 0-33575D+00 0.37297D+00 
0.25732D+00 

0.48493D-01 0.10178D+00 -0.39985D-01 0.19402D+00 0.18485D+00 
35 0.10000D+01 

0.16800D-01 -0.76457D-01 0.27590D-O1 0.51685D-01 0.23255D+00 
0.11987D+00 

0.19297D+00 -0.13336D-01 0.27748D+00 0.11772D+00 0.22651D+00 
0.10000D+01 
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-0.92362D-02 0.20553D+00 0.16060D+00 0.I8503D-02 0.81839D-01 
0.8554GD-01 

-0.78074D-02 0.89379D-01 0.37150D-01 0.24369D+00 0.I0690D+00 
0. 15442D+0O 

5 0.10000D+01 0.98167D-01 -0 . 19484D-01 0.51206D+00 0.22435D+00 
0. 34991D+00 

0.76726D-01 -0.11389D+00 0.89222D-01 0.22704D+00 0.31159D+00 
0.25272D+00 

0.16967D+00 0.27032D+00 0.10000D+01 0.54877D-01 -0.10843D+00 
10 0.30814D+00 

0.22729D+00 0.12249D+00 O.14978D+O0 -0.80009D-02 0.26167D-01 
0. 15371D+00 

0.34307D+00 0.43455D+00 0.10852D+00 0.23818D+00 0 . 35848D+00 
0.10000D+01 

15 0.53346D-01 0 . 51364D+00 -0 . 13616D+00 -0 . 11254D-01 0.38080D-01 
0.13179D+00 

0.23852D+00 0.68837D-01 -0.53993D-01 -0.11013D+00 0.38208D-01 
0.22842D+00 

0.15026D+00 0.21440D-02 0.34106D-01 0.10000D+01 -0.12307D+00 - 
20 0.20600D-01 

0.24943D+00 0.99045D-01 0.48249D+00 0.22156D+00 0.15389D+00 
0.71481D-01 

0.25974D+00 0.82698D-01 0.16346D+00 0.25823D+00 0.22793D+00 
0.39315D+00 

25 0.87080D-01 0.38362D-01 0.10000D+01 -0.14982D-01 -0.96054D-01 
0.18464D+00 

0.16839D+00 0.16761D+00 0.24899D+00 0.68591D-03 0.25407D+00 
0.15389D+00 

0.40308D+00 0.22768D+00 0 . 13627D+00 0.33529D+00 0.41978D+00 
30 0.31096D+00 

0.52853D-02 0.22597D+00 O.IOOOOD+Ol -0.46788D-01 0.90354D-02 
0.19470D+00 

0.29679D+00 0.18597D-01 0.17544D+00 0.32902D+00 0.39910D-01 
0.12491D+00 

35 0.33632D+00 0.24589D+00 0.14153D+00 0.24115D+00 0.23277D+00 
0.43132D+00 

0.95171D-01 0.47527D-01 0.42469D+00 0.10000D+01 0.11851D-01 - 
0.51613D-02 

0.78049D-01 -0.23695D-01 0.23072D-01 0.65032D+00 0.75497D-01 
40 0.20446D+00 

0.19850D+00 0.36760D-02 0.11967D+00 0.36115D-01 0.11599D+00 
0.14537D+00 

-0.35519D-01 0.19980D+00 0.11769D+00 0.19467D+00 0.93191D-01 
0.10000D+01 
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0 .37122D-01 
0.18783D-01 
-0.15785D+00 
0.17744D+O0 
0.36428D+00 
0.29033D+00 
0 . 10485D+00 



0.39142D+00 
-0.10612D+00 
0.21544D+00 
0.18533D-01 



- 197 - 
0 0.17466D+00 

0 -0.12030D+00 

0 -0.14526D-01 

1 0.10000D+01 
Printout ends- 



-0.35882D-01 
0.73570D-01 
0.19024D+00 



0.47115D-01 - 
0.68675D-01 - 
0.42626D-01 



3. Generate the item profiles 

The following steps were implemented using routines written in S-Plus. 

3.1 Generate item profiles from a linear factor model 
15 The next step involves estimating a linear factor model using the tetrachoric correlations as 

though they were product-moment correlations. The function "f actanal" in S-Plus was used 
to do this, using "mle" as the estimation method, and specifying that the model should use the 
matrix of tetrachoric correlations. 

To choose the number of components a model with 1 , 2 and 3 components was estimated, 
2 0 and at a later stage the model which gave the lowest value for the AIC was selected. 



3.2 Transform the item profiles 

Before using the item profiles in the item functions it is necessary to transform them, and to 
estimate the constant terms, according to the method described. The result for the 3 factor 
2 5 model is as follows. 







bl 




b2 




b3 




bO 


bright 


0. 


164443933 


0. 


02387331 


0. 


06656386 


-0. 


67148568 


chess 


-0. 


212229035 


0. 


02942951 


1. 


80109987 


-0. 


21662415 


natgal 


1. 


303975399 


0. 


18451642 


0. 


12909057 


-1. 


44990555 


hampt 


0. 


746484240 


-0. 


03754730 


0. 


25781809 


-1. 


02481696 


science 


0. 


839550959 


0. 


04849160 


-0. 


08324939 


-0. 


06765865 


whip 


0. 


260917932 


1. 


57653529 


0. 


08194963 


-1. 


51394915 


lego 


0. 


021755207 


0. 


13893512 


0. 


05992105 


-0. 


06765865 


east 


0. 


190738004 


0. 


38722325 


0. 


16047012 


-2. 


23537634 


lonaqu 


0. 


466563695 


0. 


37955614 


-0. 


14782961 


-0. 


81908402 


westab 


1. 


070257914 


0. 


01426026 


0. 


05832279 


-2. 


25396441 


kew 


0. 


998836592 


0. 


25822544 


0. 


13767828 


-1. 


36827586 
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10 



15 



20 



lonzoo 
madamt 

britm 
oxford 
thorpe 
nathist 

tower 
wind 
woburn 

ch.10 



0.508300363 
0.753812169 
1.669208468 
1.341022995 

-0.115980165 
0.802764028 
1.317430770 
1.001775688 

-0.008890338 
0.372239988 



0.06881175 
0.25212748 
0.37442186 
-0.07555820 
0.45865697 
0.24037708 
0.45037219 
0.20237116 
1.81306031 
0.05825895 



-0.08651507 
0.50785315 
0.14157002 

-0.08738219 
1.10414456 
0.04920244 

-0.07341733 
0.13371818 

-0.04009937 
0.84561467 



-0.02898754 
-1.46040233 
-1.66254774 
-2.11247207 
-0.74431547 
-0.26391980 
-1.13545286 
-1.73649679 
-2.39263672 
-0.95952841 



3.3 Choose the number of components 

The number of components was chosen by selecting the model, from the 
three which were estimated, which has the lowest AIC. The AlC's are: 



Number of 
components 



AIC 

13577. 
48 

2 13609. 

53 

3 13532. 

50 

The lowest value of the AIC is achieved with 3 components. The selection 
rule therefore specifies 3 components. 



4. Make recommendations 

Once the item profiles have been generated they are used to make 
recommendations. The following gives an example for a single user. The 
25 routines to implement the steps were written in S-Plus, a widely available 
statistical package. All the routines are straightforward and their functionality 
could be replicated by one skilled in the art. 

4.1 User history 

The information set on which recommendations are based gives the visiting 
3 o history of the user, as well as information on the average age of her children. 

In this case average child age is less than 10, and the user's history is: 

bright chess natgal hampt science whip lego east 
lonaqu westab kew 
35 0011 1000 

0 0 0 

lonzoo madamt britm oxford thorpe nathist tower wind 
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woburn ch.10 

0 00 00 000 

0 0 

5 

4.2 Prior distribution over possible user profiles 

This history is used to update a prior distribution over possible user 
profiles. The first task is to specify the possible profiles. Each possible 
profile requires three numbers. In this example there are 125 possible 
i o profiles. The following gives the first 1 0. It will be apparent what the 
remainder would be. 







E/2] 


[,3] 


[1,] 


-2, 


-2 


-2 


[2,] 


-2" 


-2 


-1 


[3,] 


-2 


-2 


0 


[4,] 


-2 


-2 


1 


[5,] 


-2 


-2 


2 


[6,] 


-2 


-1 


-2 


[7,] 


-2 


-1 


-1 


[8,] 


-2 


-1 


0 


[9,] 


-2 


-1 


1 


[10,] 


-2 


-1 


2 



25 

The probability of each possible profile that is assumed in the prior distribution is then 
specified. Here the binomial approximation described in the method is used (the following 
should be read as: the probability of the first profile is 0.00024, the probability of the 
second is 0.00098, the probability of the third is 0.00145 and so on). 

30 

[1] 0.0002441406 0.0009765625 0.0014648438 0.0009765625 
0.0002441406 

[6] 0.0009765625 0.0039062500 0.0058593750 0.0039062500 
0.0009765625 

35 [11] 0.0014648438 0.0058593750 0.0087890625 0.0058593750 

0.0014648438 

[16] 0.0009765625 0.0039062500 0.0058593750 0.0039062500 
0.0009765625 

[21] 0.0002441406 0.0009765625 0.0014648438 0.0009765625 
40 0.0002441406 

[26] 0.0009765625 0.0039062500 0.0058593750 0.0039062500 
0.0009765625 

[31] 0.0039062500 0.0156250000 0.0234375000 0.0156250000 
0.0039062500 
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[36] 0.0058593750 0.0234375000 0.0351562500 0.0234375000 
0.0058593750 

[41] 0.0039062500 0.0156250000 0.0234375000 0.0156250000 
0.0039062500 

5 [46] 0.0009765625 0.0039062500 0.0058593750 0.0039062500 

0.0009765625 

[51] 0.0014648438 0.0058593750 0.0087890625 0.0058593750 
0.0014648438 

[56] 0.0058593750 0.0234375000 0.0351562500 0.0234375000 
10 0.0058593750 

[61] 0.0087890625 0.0351562500 0.0527343750 0.0351562500 
0.0087890625 

[66] 0.0058593750 0.0234375000 0.0351562500 0.0234375000 
0.0058593750 

15 [71] 0.0014648438 0.0058593750 0.0087890625 0.0058593750 

0.0014648438 

[76] 0.0009765625 0.0039062500 0.0058593750 0.0039062500 
0.0009765625 

[81] 0.0039062500 0.0156250000 0.0234375000 0.0156250000 
20 0.0039062500 

[86] 0.0058593750 0.0234375000 0.0351562500 0.0234375000 
0.0058593750 

[91] 0.0039062500 0.0156250000 0.0234375000 0.0156250000 
0.0039062500 

25 [96] 0.0009765625 0.0039062500 0.0058593750 0.0039062500 

0.0009765625 

[101] 0.0002441406 0.0009765625 0.0014648438 0.0009765625 
0.0002441406 

[106] 0.0009765625 0.0039062500 0.0058593750 0.0039062500 
30 0.0009765625 

[111] 0.0014648438 0.0058593750 0.0087890625 0.0058593750 
0.0014648438 

[116] 0.0009765625 0.0039062500 0.0058593750 0.0039062500 
0.0009765625 

35 [121] 0.0002441406 0.0009765625 0.0014648433 0.0009765625 

0.0002441406 

4.3 Posterior distribution over possible user profiles 

Having specified the prior distribution it is possible to update how iikely each profile is using 
4 0 Bayesian updating in the light of the user's visiting history and the average age of her children. 

In doing so non-visits are treated as missing data. 

[1] 6.699979e-005 2 . 806902e-004 2 . 419982e-004 3 . 358869e-005 
[5] 7.632225e-007 2 . 590095e-004 1 . 048043e-0O3 8 . 304365e-004 
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[9] 1.004806e-004 1 . 977892e-006 3 . 137828e-004 1 . 207297e-003 
[13] 8.576925e-004 8 . 910190e-005 1 . 532839e-006 9 . 168272e-005 
[17] 3.277910e-004 2 . 031615e-004 1 . 798016e-005 2 . 730554e-007 
[21] 2.713426e-006 8 . 786706e-006 4 . 663137e-006 3 . 543658e-007 
5 [25] 4.833893e-009 2 . 192618e-003 9 . 233442e-003 8 . 258069e-003 

[29] 1.155176e-003 2 . 430482e-005 7 . 648856e-003 3 . 110310e-002 
[33] 2.556259e-002 3 . 101062e-003 5 . 578774e-005 8 . 012018e-003 
[37] 3.093900e-002 2 . 274881e-002 2 . 345240e-003 3 . 622275e-005 
[41] 1.874434e-003 6 . 707115e-003 4 . 279089e-003 3 . 699688e-004 
10 [45] 4.941894e-O06 4 . 171720e-005 1 . 352035e-004 7 . 347969e-005 

[49] 5.370655e-006 6 . 336093e-008 1 . 25O7Ole-O02 5 . 091771e-002 
[53] 4. 476230e-002 5 . 986783e-003 1 . 105110e-004 3 . 542372e-002 
[57] 1.383032er001 1 . 108921e-001 1 . 270664e-002 1 . 967364e-004 
[61] 2.803246e-002 1 . 029439e-001 7 . 306196e-002 6 . 990032e-003 
15 [65] 9.072425e-005 4 . 458134e-003 1 . 498357e-002 9 . 095821e-003 

[69] 7.134330e-004 7 . 807930e-006 6 . 285411e-005 1 . 892204e-004 
[73] 9.641495e-005 6 . 249456e-006 5 . 918083e-008 6 . 401432e-003 
[77] 2.328295e-002 1 . 831228e-002 2 . 146807e-003 3 . 223165e-005 
[81] 1.204728e-002 4 . 128927e-002 2 . 912702e-002 2 . 875144e-003 
20 [85] 3.551597e-005 5. 800173e-003 1 . 831337e-002 1 . 122342e-002 

[89] 9.069408e-004 9 . 205726e-006 5 . O87200e-004 1 . 438586e-003 
[93] 7.401864e-004 4 . 808128e-005 4 . 049637e-007 3 . 859974e-006 
[97] 9. 616884e-006 4 . 095597e-006 2 . 166825e-007 1 . 568099e-009 
[101] 7. 607398e-005 2 . 231007e-004 1 . 420848e-004 1 . 364 434e-005 
25 [105] 1.618849e-007 8 . 156078e-005 2 . 226466e-004 1 . 264308e-004 

[109] 1.023321e-005 1 . 003628e-007 2 . 188857e-005 5 . 445354e-005 
[113] 2.677570e-005 1 . 778263e-006 1 . 439724e-008 1 . 051691e-006 
[117] 2.329810e-006 9 . 638 923e-007 5 . 174587e-008 3 . 504214e-010 
[121] 4.653072e-009 9 . 110448e-009 3 . 149613e-009 1 . 391284e-010 
30 [125] 8.202664e-013 



4.4 Probability of a visit 

This posterior distribution over possible user profiles is then used to work out the likelihood of 
a visit to each of the 20 attractions. The probability of a visit to Brighton, say, is calculated by 
35 working out, for each possible profile, what the probability of visiting Brighton is, and then 

weighting each of these using the probability that the user's profile is the relevant one. The 
result is: 

[1] 0.3801371 0.3874973 0.5104397 0.4524723 0.6982596 0.3164832 
[7] 0.4895891 0.1248395 0.4433899 0.2850701 0.4509532 0.6339611 
40 [13] 0.3587119 0.5523940 0.3858625 0.3125870 0.6476852 0.5853585 

[19] 0.3711684 0.1843304 
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Make a recommendation 

The recommended attraction is that one with the highest probability of a visit, but which has 
not yet been visited. The attraction with the highest probability of a visit is number 5, the 
science museum. The user has already visited this, however and it is not recommended. 
The recommendation is item 17, the Natural History museum. The expected probability is 
0.648. 

Appendix A 

This is a numerical example of the implementation of a preferred method 
according to the invention. 



1. Specify the data 
1.1 The set of items 

The data in the example describe visits to a number of London Attractions. 
There are 20 attractions. These attractions are labelled in various ways in 
what-follows. The labels, and the attraction identities, are: 





BRIGHTON 


Brighton 


1 




CHESS 


Chessington 


2 




NATGAL 


National Gallery 


3 


20 


HAMPTON 


Hampton Court Gardens 


4 




SCIENCE 


Science Museum 


5 




WHIPSNDE 


Whipsnade 


6 




LEGO 


Legoland 


7 




EASTBORN 


Eastbourne 


8 


25 


LONAQUA 


London Aquarium 


9 




WESTABBY 


Westminster Abbey 


10 




KEW 


Kew Gardens 


11 




LONZOO 


London Zoo 


12 




MADTUS 


Madam Tussauds 


13 


30 


BRITMUS 


British Museum 


14 




OXFORD 


Oxford 


15 




THORPE 


Thorpe Park 


16 




NATHIST 


Natural History Museum 17 




TOWER 


Tower of London 


18 


35 


WINDSOR 


Windsor Castle 


19 




WOBORN 


Woburn Wildlife Park 


20 



1.2 The data set 

The data records attendance at each attraction for 624 users. Each user is represented by a 
row in the data set. The first column in the row is the first attraction (Brighton), the second 
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column is the second attraction (Chessington) and so on. The data records "1" if the user has 
visited the attraction in the past 4 years, and 0 otherwise. The following gives the first 10 
records from the dataset (the full set is in an appendix). As an example, this data records that 
the first user has visited Brighton and the National Gallery, but not Chessington. 

5 Extract begins 



1 


0 


1 


1 


1 


0 


0 


0 


1 


1 


1 


1 


1 


1 


1 


0 


1 


1 


1 


0 


1 


1 


1 


1 


1 


0 


1 


1 


1 


1 


1 


1 


1 


1 


0 


1 


1 


1 


1 


0 


0 


1 


1 


1 


1 


0 


1 


0 


0 


1 


1 


1 


1 


1 


1 


1 


1 


1 


1 


0 


0 


0 


1 


1 


1 


0 


1 


0 


1 


1 


1 


1 


1 


1 


1 


0 


1 


1 


1 


0 


0 


0 


1 


0 


1 


0 


0 


0 


1 


1 


1 


0 


0 


1 


0 


0 


1 


0 


0 


0 


1 


1 


1 


1 


1 


1 


1 


1 


1 


1 


1 


1 


1 


1 


1 


1 


1 


1 


1 


1 


0 


1 


1 


1 


1 


1 


0 


1 


1 


1 


0 


1 


0 


1 


0 


0 


1 


1 


1 


0 


1 


1 


0 


1 


1 


1 


1 


0 


0 


1 


1 


1 


0 


1 


0 


1 


1 


0 


0 


1 


1 


0 


1 


0 


1 


1 


0 


0 


0 


0 


1 


0 


0 


1 


1 


0 


1 


1 


0 


0 


0 


1 


1 


1 


1 


0 


0 


0 


0 


0 


1 


0 


0 


1 


0 


0 


1 


1 


1 


0 



Extract ends 



2. Generate the item profiles 

20 To derive the item profiles from the data the program TWOMISS was used. 2 components 

were specified. This specification is convenient when the administrator wants to visualise the 
results. 

2.1 Inputs 

Generating item profiles from TWOMISS required setting up a command file that contained the 
2 5 commands and the data. The command file, including the first 10 lines of data, was as follows. 



Extract begins • 

attractions data 
30 624 20 16 

110 0 1 1000 1 0.00000001 

10111000111111101110 
11111011111111011110 
01111010011111111110 
35 0011101011111110 1 110 

0010100011100100 1 000 
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1 


1 


1 


1 


1 


1 


1 


1 


1 


1 


1 


1 


1 


1 


1 


1 


1 


1 


1 


1 


0 


1 


1 


1 


1 


1 


0 


1 


1 


1 


0 


1 


0 


1 


0 


0 


1 


1 


1 


0 


1 


1 


0 


1 


1 


1 


1 


0 


0 


1 


1 


1 


0 


1 


0 


1 


1 


0 


0 


1 


1 


0 


1 


0 


2, 


1 


0 


0 


0 


0 


1 


0 


0 


1 


1 


0 


1 


1 


0 


0 


0 


1 


1 


1 


1 


0 


0 


0 


0 


0 


1 


0 


0 


1 


0 


0 


1 


1 


1 


0 



Extract ends 



2.2 Outputs 

10 TWOMISS generated the following output file. Only an extract is shown - a lot of the 

diagnostics results are omitted. 



-Extract begins- 



*** PROGRAM TWOMISS *** 
15 MAXIMUM LIKELIHOOD ESTIMATION OF A 2 FACTOR LOGIT/PROBIT 

MODEL 1 forNON-RESPONSES for BINARY DATA 
attractions data 

NUMBER OF OBSERVED VARIABLES = 20 
20 NUMBER OF CASES SAMPLED = 624 

NUMBER OF DIFFERENT RESPONSE PATTERNS = 543 

NUMBER OF ITERATIONS IS 4 08 

% OF G-SQUARE EXPLAINED 9.7217 

25 LOGLIKELIHOOD VALUE -6301.4533 

LIKELIHOOD RATIO STAT. 3075.62681 

DEGREES OF FREEDOM -48 



30 MAXIMUM LIKELIHOOD ESTIMATES OF ITEM PARAMETERS AND STANDARD 

DEVIATIONS 

ITEM I ALPHA(0,I) S.D ALPHA (1, 1) S.D ALPHA(2,I) S.D 

P(X=1/Z=0) 

35 

1 -0.6802 0.0926 0.0704 0.1211 0.0539 0.1331 
0.336 

2 -0.2718 0.1073 0.5666 0.7178 -0.7902 0.5099 
0.432 

40 3 -1.8687 0.1779 0.4720 1.0221 1.1784 0.4671 
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10 



15 



20 



25 



30 



35 



0.134 
4 

0.248 
5 

0.480 
6 

0.165 
7 

0.480 
8 

0.094 
9 

0.292 
10 

0.069 
11 

0.169 
12 

0.4 91 
13 

0.168 
14 

0.095 
15 

0.077 
16 

0.072 
17 

0.420 
18 

0.203 
19 

0.124 
20 

0.072 



-1.1091 

-0.0792 

-1.6246 

-0.0812 

-2.2609 

-0.8844 

-2.6064 

-1.5944 

-0.0344 

-1.5998 

-2.2586 

-2.4845 

-2.5609 

-0.3246 

•1.3700 

1.9593 

2.5633 



0.1094 0.3798 0.4086 0.4534 0.3757 



0.1108 0.7731 0.6404 0.7170 0.7036 



0.1273 0.5688 0,1822 0.1073 0.5121 



0.0936 0.4707 0.2271 -0.1895 0.4279 



0.1484 0.1971 0.1746 0.0936 0.2577 



0.1028 0.3768 0.3787 0.4252 0.35B9 



0.2221 0.2910 0.8004 0.9070 0.3510 



0.1369 0.6185 0.6250 0.6698 0.5662 



0.1014 0.7496 0.2182 0.1763 0.6720 



0.1284 0.6243 0.2503 0.2417 0.5751 



0.2023 0.8328 1.0463 1.2082 0.7884 



0.1922 0.5724 0.7306 0.8150 0.5343 



2.2307 3.6515 4.8844 -3.4526 4.6125 



0.1147 0.8504 0.6313 0.6654 0.7504 



0.1336 0.6666 0.6878 0.7828 0.6334 



0.1485 0.6560 0.4665 0.4697 0.5873 



0.1844 0.6230 0.2112 0.0168 0.5718 



-Extract ends- 



Looking at the table, the attraction is identified in the first column. The item profiles are 
given in the columns marked "ALPHA (0,1)" "ALPHA (1,1)" and "ALPHA (2,1)". The 
first of these is the constant term b 0 . The other columns give measures of the statistical fit 
of the model. 
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As an example consider the British Museum. This is item number 14. The results above give 
the item profile for the British Museum as: 

(6 0 , b, , b 2 ) = (- 2.2586, 0.8328, 1 .2082) 



3. Make recommendations 

Once the item profiles have been generated they are used to make recommendations. The following gives 
an example for a single user. The routines to implement the steps were written in S-Plus, a widely available 
statistical package. All the routines are straightforward and their functionality could be replicated by one skilled 
in the art. 

3.1 User history 

The information set on which recommendations are based gives the visiting history of the user. This is: 

bright chess natgal hampt science whip lego east lonaqu westab kew 
0 0 11 1000 0 00 

lonzoo roadamt britm oxford thorpe nathist tower wind woburn 
0 0000 000 0 



3.2 Prior distribution over possible user profiles 

This history is used to update a prior distribution over possible user profiles. The first task is to specify the 
possible profiles. Each possible profile requires two numbers. In this example the possible profiles are: 





[,1] 


1,2] 


[1,] 


-2 


-2 


[2,] 


-2 


-1 


[3,] 


-2 


0 


[4,] 


-2 


1 




-2 


2 


[6,] 


-1 


-2 


[7,] 


-I 


-1 


[8,] 


-1 


0 


[9,] 


-i 


1 


[10,] 


-1 


2 


[11/] 


0 


-2 


[12,] 


0 


-1 


[13,] 


0 


0 


[14,] 


0 


1 


[15,] 


0 


2 


[16,] 


1 


-2 
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[17, ] 


1 


-1 


[18, ] 


1 


0 




J. 


l 


[20,] 


1 


2 


[21,] 


2 


-2 


[22, ] 


2 


-1 


[23, ] 


2 


0 


[24, ] 


2 


1 


[25, ] 


2 


2 



The probability of each possible profile that is assumed in the prior distribution is then specified. Here the 
binomial approximation described in<the method is used (the following should be read as: the probability of 
the first profile is 0.0039, the probability of the second is 0.0156, the probability of the third is 0.234 and so 
on). 

[1] 0.00390625 0.01562500 0.02343750 0.01562500 0.00390625 
[6] 0.01562500 0.06250000 0.09375000 0.06250000 0.01562500 
[11] 0.02343750 0.09375000 0.14062500 0.09375000 0.02343750 
[16] 0.01562500 0.06250000 0.09375000 0.06250000 0.01562500 
[21] 0.00390625 0.01562500 0.02343750 0.01562500 0.00390625 

3.3 Posterior distribution over possible user profiles 

Having specified the prior distribution it is possible to update how likely each profile is using Bayesian updating 
in the light of the user's visiting history. In doing so non-visits are treated as missing data. 

~ 4.216343e-005 2 . 112094e-003 2 . 653238e-002 8 . 865934e-002 

[5] 4.837746e-002 1 . 109330e-004 1 . 388096e-002 1 . 472363e-001 
[9J 3.019428e-001 7 . 143967e-002 7 . 536219e-006 6 . 086883e-003 
[13] 1.288960e-001 1 . 397300e-001 1 . 195930e-002 8 . 154766e-008 
[17] 5.951040e-005 5 . 049851e-003 7 . 615486e-003 2 . 471819e-004 
[21] 1.408664e-010 5 . 562026e-008 2 . 743733e-006 1 . 069964e-005 
[25] 5.195977e-007 



3.4 Probability of a visit 

This posterior distribution over possible user profiles is then used to work out the likelihood of a visit to each 
attraction. The probability of a visit to Brighton, say, is calculated by working out, for each possible profile, 
what the probability of visiting Brighton is, and then weighting each of these using the probability that the user's 
profile is th relevant one. The result is: 

[1] 0.3602410 0.3465327 0.4420367 0.4132967 0.7439769 0.2564223 
[7] 0.5088269 0.1176002 0.4583606 0.2129104 0.3982676 0.6469330 
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[13J 0.2979243 0.4219590 0.2499722 0.2270095 0.6982817 0.4828844 
[19] 0.2829756 0.1180267 



3.5 Make a recommendation 

The recommended attraction is that one with the highest probability of a visit, but which has not yet been 
vsrted. The attraction with the highest probability of a visit is number 5, the science museum The user has 
already verted this, however and it is not recommended. The recommendation is item 1 7, the Natural History 
museum. The expected probability is 0.698 



Appendix I 



The following is an example of the alternative preferred method, using tetrachoric correlations of observations 
to estimate the correlations between continuous variables. 



1. Specify the data 
1.1 The set of items 

The data in the example describe visits to a number of London Attractions. There are 20 attractions These 
attractions are labelled in various ways in what follows. The labels, and the attraction identities, are: 



BRIGHTON 

CHESS 

NATGAL 

HAMPTON 

SCIENCE 

WHIPSNDE 

LEGO 

EAST BORN 

LONAQUA 

WES TABBY 

KEW 

LONZOO 
MADTUS 
BRITMUS 
OXFORD 



Brighton 

Chessington 

National Gallery 
Hampton Court Gardens 4 
Science Museum 5 

Whipsnade 

Legoland 

Eastbourne 
London Aquarium 9 

Westminster Abbey 

Kew Gardens 

London Zoo 
Madam Tussauds 
British Museum 14 
Oxford 



1 

2 
3 



6 
7 
8 

10 
11 
■12 
13 

15 
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THORPE Thorpe Park 16 

NATHIST Natural History Museum 17 

TOWER Tower of London 18 

WINDSOR Windsor Castle 19 

WOBORN Woburn Wildlife Park 20 



1 .2 The data set 

The data records attendance at each attraction for 624 users. Each user is represented by a row in the data 
set. The first column in the row is the first attraction (Brighton), the second column is the second attraction 
(Chessington) and so on. The data records "1" if the user has visited the attraction in the past 4 years, 
and 0 otherwise. The following gives the first 10 records from the dataset (the full set is in appendix B1). 
As an example, this data records that the first user has visited Brighton and the National Gallery, but not 
Chessington. 



Extract begins 



1 


0 


1 


1 


1 


0 


0 


0 


1 


1 


1 


1 


1 


1 
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1 


1 


1 


0 


0 


0 


0 


0 


1 


0 


0 


1 


0 


0 


1 


1 


1 


0 



\ 



Extract ends 



2. Generate the tetrachoric correlations 

The tetrachoric correlations were calculated using the PRELIS, which is distributed with LISREL, a widely 
available statistical package. Following is a printout of the output file. The figures should be read from left 
to right and give only the lower left triangle of the correlation matrix. For example the first number is the 
tetrachoric correlation between items (1,1), ie between Brighton and Brighton, and so is 1 by definition. The 
second figure is the tetrachoric correlation between the second items (2,1), ie between Chessington and 
Brighton. The third figure is for items (2,2), and so on. The pattern is built up as: 

1 51 (1,1) 

2 nd and 3 rd (2,1) (2,2) 

4 th , 5 th and 6 th (3,1) (3,2) (3,3)... 
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Printout starts 



0 


•lOOOOD+01 


0.30859D-01 


0.10000D+01 


0.16190D+00 


-0 


.57209D-02 


0.10000D^01 


0 


.24375D+00 


0. 89119D-01 


0.28443D+0O 


0.10000D-01 


0 


.44469D-01 


-0.83145D-01 


0 


-38516D+00 


0.23402D+00 


0.10000D+01 


0.51530D-02 


0 


.35267D-01 


0.14557D+00 


0 


47440D-01 


0.94268D-01 


0.10000D4-01 


-0.98718D-0I 


0 


.38950D-01 


-0.11513D+00 


0 


.38859D-01 


0. 98427D-01 


0.12480D+00 


0.10000D+01 


0 


■16793D+00 


0.79544D-01 


0 


.87762D-01 


0.66322D-02 


-0.19969D-01 


0.17030D+00 


-0 


.23559D+00 


0.10000D+01 


0 


. 132 jOD-01 


-0 . 36938 D- 01 


0.67831D-O1 


-0.13165D-01 


0 


. 21256D+00 


0.50056D-01 


0 


.17875D+00 


-0.90583D-01 


0.10000D+01 


0.80235D-01 


-0 


.28762D-01 


0.37060D+00 


0 


.15095D+00 


-0.87271D-02 


-0.21707D-01 


-0.80627D-01 


0 


.24432D4-00 


0.10601D+00 


0 


.10000D+01 


-0.63046D-02 


0.15365D-01 


0.33770D+00 


0 


. 37511D+00 


0.26084D+00 


0 


.50825D-01 


0.10574D+00 


-0.38016D-01 


0.19673D+00 


0 


. 18665D+00 


0.I0O00D+01 


0 


.22228D-01 


-0.69500D-01 


0.31688D-01 


0.56343D-01 


0 


.23850D+00 


0.12369D+00 


0 


.19915D+00 


-0.99709D-02 


0.28168D+00 


0.12087D+00 


0 


.23019D+00 


0.10000D+01 


-0 


.61246D-02 


0.20887D+00 


0.16278D+00 


0.45582D-02 


0 


.85736D-01 


0.87777D-01 


-0 


.37335D-02 


0.91217D-01 


0.40034D-O1 


0.24536D+00 


0. 


.10920D+00 


0.15821D+00 


0. 


.10000D+01 


0.10096D+00 


-0.15898D-O1 


0.51349D+00 


0. 


.22662D+00 


0.35285D+00 


0. 


.78836D-01 


-0.10993D+00 


0.90954D-01 


0.22947D+00 


0. 


.31309D+00 


0.25470D+00 


0, 


.17321D+00 


0.27222D+00 


0.10000D+01 


0.57412D-01 


-0. 


10519D+00 


0.30978D+00 


0. 


.22930D+00. 


0.12S68D+00 


0.15159D+00 


-0.46045D-02 


0. 


27738D-01 


0.15598D+00 


0. 


34436D+00 


0.43601D+00 


0.11179D+00 


0.23991D+00 


0. 


35995D+00 


0.10O00D+01 


0. 


57234D-01 


0.51653D+00 


-0.13304D+00 


-0.77538D-02 


0. 


43194D-01 


0.13457D+00 


0. 


24292D+00 


0.71213D-01 


-0.50154D-01 


-0.10765D+00 


0. 


41262D-01 


0.23294D+00 


0. 


15306D+00 


0.49770D-02 


0.36588D-01 


0.10000D+01 


-0. 


11794D+00 


-0.14578D-01 


0. 


25259D+00 


0.10309D+00 


0.48637D+00 


0.22474D+00 


0. 


15963D+00 


0.74381D-01 


0. 


26358D+00 


0.85570D-01 


0.16692D+00 


0.26353D+00 


0. 


23114D+00 


0.39571D+00 


0. 


90043D-01 


0.43015D-0I 


0.10000D+01 


-0.11512D-01 


-0. 


91696D-01 


0.18703D+00 


0. 


17115D+00 


0.17169D+00 


0.25122D+00 


0.52008D-02 


0. 


25591D+00 


0.15690D+00 


0. 


40467D+0O 


0.23005D+00 


0.14052D+00 


0.33738D+00 


0. 


42158D+00 


O.31277D+00 


0. 


86295D-02 


0.22952D+00 


0.10000D+01 


-0.43889D-01 


0. 


12507D-01 


0.19668D+00 


0. 


29888D+00 


0.22309D-01 


0.17741D+00 


0.33198D+00 


0. 


41637D-01 


0.12746D+00 


0. 


33775D+00 


0.24784D+00 


0.14507D+00 


0.24306D+00 


0. 


23457D+00 


0.43265D+00 


0. 


97836D-01 


0.50860D-01 


0.42644D+00 


0.10000D+01 


0. 


14261D-01 • 


-0.22059D-02 


0. 


79836D-01 - 


-0.21568D-01 


0.26212D-01 


0.65122D+00 


0. 


78564D-01 


0.20582D+00 


0. 


20058D+00 


0.51469D-02 


0.12147D+00 


0.39297D-01 


0. 


11774D+00 


0.14699D+00 


-0. 


33985D-01 


0.20193D+00 


0.12043D+00 


0.19653D+00 


0. 


94825D-01 


0.10000D+01 



Printout ends- 
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3. Generate the item profiles 

The following steps were implemented using routines written in S-Plus. 
3. 1 Generate item profiles from a linear factor model 

The next step involves estimating a linear factor model using the tetrachoric correlations as though they were 
product-moment correlations. The function M f actanal" in S-Plus was used to do this, using "mle" as the 
estimation method, and specifying that the model should use the matrix of tetrachoric correlations. 

To choose the number of components a model with 1, 2 and 3 components was estimated, and the model 
which gave the lowest value for the AIC was selected. Here just the output for the 3 factor model is given. In 
this list Brighton, for example, is identified as M X1". 







bl 




b2 




b3 


XI 


0 


.09812377 


0. 


01172569 


0. 


058754708 


X2 


-0.04223647 


-0. 


04764051 


0. 


524952031 


X3 


0 


.58772477 


0. 


10554566 


-o. 


131620998 


X4 


0 


.40369691 


-0. 


01218747 


0. 


003927246 


X5 


0 


.42576703 


0. 


03238520 


0. 


050496584 


X6 


0 


.10662699 


0. 


65120393 


0. 


060790719 


X7 


0 


.03506458 


0. 


05954881 


0. 


238530868 


X8 


0 


.11046878 


0. 


20506293 


0. 


050144673 


X9 


0 


.25271908 


0. 


21336301 


-0. 


069474679 


X10 


0 


.51048182 


0. 


02588921 


-0. 


098528948 


Xll 


0 


.49170279 


0. 


13060467 


0. 


038550361 


X12 


0 


.28804377 


0. 


02624733 


0. 


238872437 


X13 


0 


.36181297 


0. 


11430611 


0. 


149815576 


X14 


0 


.65958452 


0. 


16336789 


0. 


002362186 


X15 


0 


.59758813 


-0. 


02425055 


0. 


054954849 


XI 6 


-0 


.02527818 


0. 


11813677 


0. 


992629902 


X17 


0 


.40883780 


0. 


12757439 


0. 


038566893 


X18 


0 


.54724404 


0. 


21079612 


-0. 


002458373 


XI 9 


, 0 


.48305439 


0. 


09853702 


0. 


099141707 


X20 


-0 


.02418029 


0. 


99611314 


0. 


084262195 



3.2 Transform the item profiles 

Before using the item profiles in the item functions it is necessary to transform them, and to estimate the 
constant terms, according to the method described. The result for the 3 factor model is as follows. 

bl b2 b3 bO 

bright 0.17916486 0.02141001 0.107280622 -0.67148568 

chess -0.09026066 -0.10180926 1.121838928 -0.21662415 

natgal 1.34721208 0.24193703 -0.301708229 -1.44990555 

hampt 0.80041830 -0.02416434 0.007786632 -1.02481696 
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science 


0 


.85536112 


0 


.06506150 


0 


.101447062 


-0 


.06765865 


whip 


0 


.25824137 


1 


.57715976 


0 


.147229879 


-1 


.51394915 


lego 


0 


.06565695 


0 


.11150264 


0 


. 446638983 


-0 


.06765865 


east 


0 


.20630971 


0 


.38297223 


0 


.093649385 


-2 


.23537634 


lonaqu 


0 


.48703898 


0 


.41119215 


-0 


. 133891260 


-0.81908402 


westab 


1 


.08441820 


0 


.05499653 


-0 


.209305366 


-2 


.25396441 


kew 


1. 


.03697579 


0 


.27543851 


0 


.081300719 


-1 


.36827586 


lonzoo 


0. 


.56361160 


0, 


.05135782 


0. 


.467398672 


-0 


.02898754 


madamt 


0. 


71878587 


0. 


22708312 


G . 


297627027 


-1. 


46040233 


britm 


1. 


63067053 


0. 


40388941 


0. 


005839960 


-1. 


66254774 


oxford 


1. 


35564366 


-0. 


05501297 


0. 


124666452 


-2. 


11247207 


thorpe 


-0. 


04584748 


0. 


21426669 


1. 


800349935 


-0. 


74431547 


lathist 


0. 


82136797 


0. 


25630094 


0. 


077482099 


-0. 


26891980 


tower 


1.22543682 


0. 


47203314 


-0. 


005505005 


-1. 


13545286 


wind 


1. 


01365495 


0. 


20677286 


0. 


208041754 


-1. 


73649679 


woburn ■ 


-0. 


04385657 


1. 


80668272 


0. 


152829077 


-2. 


39263672 



3.3 Choose the number of components 

The number of components is chosen by selecting the model, from the three which have been estimated 
which has the lowest AlC. TheAICTsare: 



Number of 

components 

1 



2 
3 



AlC 



12844.7 
6 

12875.1 
4 

12833.8 



The lowest value of the AlC is achieved with 3 components. L selection rule therefore specifies 3 
components. 



4. Make recommendations 

Once the item profiles have been generated they are used to make recommendations. The following gives 
an example for a single user. The routines to implement the steps were written in S-Plus, a widely available 
statistical package. All the routines are straightforward and their functionality could be replicated by one skilled 
in the art. 

4.1 User history 

The information set on which recommendations are based gives the visiting history of the user. This is: 
bright chess natgal hampt science whip lego east lonaqu westab kew 
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00 11 1 000 0 00 

lonzoo madamt britm oxford thorpe nathist tower wind woburn 
0 00 0 0 000 0 

4.2 Prior distribution over possible user profiles 

This history is used to update a prior distribution over possible user profiles. The first task is to specify the 
possible profiles. Each possible profile requires three numbers. In this example there are 125 possible 
profiles. The following gives the first 10. It will be apparent what the remainder would be. 





1,1] 


[,2] 


[,3J 


[1,] 


-2 


-2 


-2 


[2,] 


-2 


-2 


-1 


[3,] 


-2 


-2 


0 


[4,] 


-2 


-2 


1 


[5,] 


-2 


-2 


2 


["6,1 


-2 


-1 


-2 


[7,] 


-2 


-1 


-1 


[8,] 


-2 


-1 


0 


[9,] 


-2 


-1 


1 


[10, ] 


-2 


-1 


2 



The probability of each possible profile that is assumed in the prior distribution is then specified. The 
binomial approximation described in the method is used (the following should be read as: the probability of 
the first profile is 0.00024, the probability of the second is 0.00098, the probability of the third is 0.00145 
and so on). 



[1] 


0. 


0002441406 


0 


.0009765625 


0. 


0014648438 


0 


.0009765625 


0. 


0002441406 


[63 


0. 


0009765625 


0 


.0039062500 


0. 


0058593750 


0 


.0039062500 


0. 


0009765625 


[ID 


0. 


0014648438 


0 


.0058593750 


0. 


0087890625 


0 


.0058593750 


0. 


0014648438 


[16] 


0. 


0009765625 


0 


.0039062500 


0. 


0058593750 


0 


.0039062500 


0. 


0009765625 


[21] 


0. 


0002441406 


0 


.0009765625 


0. 


0014648438 


0 


.0009765625 


0. 


0002441406 


[26] 


0. 


0009765625 


0 


.0039062500 


0. 


0058593750 


0 


.0039062500 


0. 


0009765625 


[31] 


0. 


0039062500 


0 


.0156250000 


0. 


0234375000 


0 


.0156250000 


0. 


0039062500 


[36] 


0. 


0058593750 


0 


.0234375000 


0. 


0351562500 


0 


.0234375000 


0. 


0058593750 


[41] 


0. 


0039062500 


0 


.0156250000 


0. 


0234375000 


0 


.0156250000 


0. 


0039062500 


[46] 


0. 


0009765625 


0 


.0039062500 


0. 


0058593750 


0 


.0039062500 


0. 


0009765625 


[51] 


0. 


0014648438 


0 


.0058593750 


0. 


0087890625 


0 


.0058593750 


0. 


0014648438 


[56] 


0. 


0058593750 


0 


.0234375000 


0. 


0351562500 


0 


.0234375000 


0. 


0058593750 


[61] 


0. 


0087890625 


0 


.0351562500 


0. 


0527343750 


0 


.0351562500 


0. 


0087890625 


[66] 


0. 


0058593750 


0 


.0234375000 


0. 


0351562500 


0 


.0234375000 


0. 


0058593750 


[71] 


0. 


0014648438 


0 


.0058593750 


0. 


0087890625 


0 


.0058593750 


0. 


0014648438 


[76] 


0. 


0009765625 


0 


.0039062500 


0. 


005B593750 


0 


.0039062500 


0. 


0009765625 


[81] 


0. 


0039062500 


0 


.0156250000 


0. 


0234375000 


0 


.0156250000 


0. 


0039062500 
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[86] 0.0058593750 0.0234375000 
[91] 0.0039062500 0.0156250000 
[96] 0.0009765625 0.0039062500 
[101] 0.0002441406 0.0009765625 
[106] 0.0009765625 0.0039062500 
[111] 0.0014648438 0.0058593750 
[116] 0.0009765625 0.0039062500 
[121] 0.0002441406 0.0009765625 
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0.0351562500 0.0234375000 0.0058593750 
0.0234375000 0.0156250000 0.0039062500 
0.0058593750 0.0039062500 0.0009765625 
0.0014648438 0.0009765625 0.0002441406 
0.0058593750 0.0039062500 0.0009765625 
0.0087890625 0.0058593750 0.0014648438 
0.0058593750 0.0039062500 0.0009765625 
0.0014648438 0.0009765625 0.0002441406 



4.3 Posterior distribution over possible user profiles 

Having specified the prior distribution it is then possible to update how likely each profile is using Bayesian 
updating in the light of the user's visiting history. In doing so non-visits are treated as missing data. 

[1] 8.749907e-005 1 . 820013e-004 8 . 450827e-005 6. 548309e-006 
[5] 7.164878e-008 3 . 961831e-004 8 . 156683e-004 3 . 634953e-004 
[9] 2.570837e-005 2 . 632381e-007 5.792464e-004 1 . 157804e-003 
[13] 4.825574e-004 3 . O53O29e-005 2 . 878185e-007^.242JS;54e-004 
[17] 4.107871e-004 1 . 499652e-004 8.003480e-006*V."562i^le"-008 
[21] 9.523444e-006 1 . 521454e-005 4 . 651408e-006 2 . 044132e-007 
[25] 1.441148e-009 3 . 548322e-003 7 . 103657e-003 3 . 155501e-003 
[29] 2.311364e-004 2 . 311808e-006 1 . 432083e-002 2 . 831893e-002 
[33] 1.204498e-002 8 . 023704e-004 7 . 4 66107e-006 1 . 782866e-002 
[37] 3.410567e-002 1 . 350949e-002 8 . 000372e-004 6.798161e-006 
[41] 5.443664e-003 9 . 491454e-003 3 . 273783e-003 1.622767e-004 
[45] 1.189165e-006 1 . 696725e-004 2.579233e-004 7.446106e-005 
[49] 3.032338e~006 1 . 906306e-008 2 . 416957e-002 4 . 609570e-002 
[53] 1.921800e-002 1 . 300825e-003 1 .161696e-005 7 . 619505e-002 
[57] 1.435425e-001 5 . 727368e-002 3 . 518754e-003 2. 910110e-005 
[61] 6.842617e-002 1 . 244226e-001 4 . 611078e-002 2.507375e-003 
[65] 1.8816096-005 1 . 348691e-002 2 .226247e-002 7 . 160354e-003 
[69] 3.245205e-004 2 . 091073e-006 2 .495306e-004 3.594790e-004 
[73] 9.701760e-005 3 . 619574e-006 2 . 006631e-008 1 . 302715e-002 
[77] 2.367770e-002 9 . 259014e-003 5 .789887e-004 4 . 610520e-006 
[81] 2.541782e-002 4 . 550767e-002 1 . 703579e-002 9 . 686878e-004 
[85] 7.152861e-006 1 . 286919e-002 2 . 206853e-002 7 . 645826e-003 
[89] 3.843336e-004 2 . 575478e-006 1 . 297935e-003 1 . 999784e-003 
[93] 5.987266e-004 2 . 508436e-005 1 . 44 9616e-007 1 . 201406e-005 
[97] 1.6059806-005 4 . 036751e-006 1 . 399459e-007 7 . 033403e-010 
[101] 1.451943e-004 2 . 442635e-004 8 . 941886e-005 5 . 290626e-006 
[105] 3.924750e-008 1 . 519482e-004 2 . 483600e-004 8 . 636743e-005 
[109] 4.638888e-006 3. 200580e-008 4 . 069437e-005 6.263256e-005 
[113] 1.993554e-005 9. 415378e-007 5 . 897003e-009 2 . 164317e-006 
[117] 2.948934e-006 8 . 044585e-007 3 . 159448e-008 1 . 714367e-010 
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[121) 1.139329e-008 1 . 338166e-008 3 . 060821e-009 9 . 973320e-011 
[125) 4.7451Sle-013 



4.4 Probability of a visit 

This posterior distribution over possible user profiles is then used to work out the likelihood of a visit to each 
attraction. The probability of a visit to Brighton, say, is calculated by working out, for each possible profile, 
what the probability of visiting Brighton is, and then weighting each of these using the probability that the user's 
profile is the relevant one. The result is: 

[1] 0.3870819 0.4108272 0.5532911 0.4876843 0.7103175 0.3310440 
[7] 0.4949912 0.1313193 0.4609472 0.3095996 0.4826755 0.6374526 
[13] 0.3675939 0.5743559 0,4031034 0.3512299 0.6664543 0.5865752 
[19] 0.3916554 0.1871927 * 



Make a recommendation 

The recommended attraction is that one with the highest probability of a visit, but which has not yet been 
visited. The attraction with the highest probability of a visit is number 5, the science museum. The user has 
already visited this, however and it is not recommended. The recommendation is item 17, the Natural History 
museum. The expected probability is 0.666. 
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Example 7 
002 

A PCA topping based on scores. 

B Step - estimate the item profiles. 

First do PCA analysis on the covariance matrix. The following is output from 
S-PLUS V 



> cbind(Dom.pca$b[,l:3] , hbar=Dom.pca$hbar) 







P\- ± 




PC 2 




PC3 




hbar 


bright 


0 


.01702424 


-0 


.03265263 


-0 


.412040936 


0 


.33816425 


chess 


-0. 


. 02872608 


0 


.62200723 


-0. 


.376592717 


0.44605475 


natgal 


0 


.20941066 


-0 


.14936054 


-0. 


.268636236 


0 


.19001610 


hampt 


0. 


. 19091245 


-0 


.03316651 


-0. 


.347284798 


0 , 


.26409018 


science 


0. 


,45500923 


-0 


.13794577 


-0. 


038133444 


0 , 


.48309179 


whip 


0. 


.12634410 


0 


.06386758 


-0. 


012276090 


0 , 


.18035427 


lego 


0. 


19121826 


0 


.36480031 


0. 


478449889 


0.48309179 


east 


0 . 


, 01404058 


-0 


.00654658 


-0. 


102627621 


0 . 


,09661836 


lonaqu 


0. 


26664885 


-0 


.06199254 


0. 


233395599 


0. 


,30595813 


westab 


0. 


07639228 


-0 


.05113437 


-0. 


096709504 


0 . 


09500805 


kew 


0. 


23023112 


-0. 


.02068946 


-0. 


120386433 


0. 


20289855 


lonzoo 


0. 


36141969 


0, 


.15191398 


0. 


265047262 


0 . 


49275362 


madantt 


0. 


14627349 


0, 


.09109878 


-0. 


134194851 


0 . 


18840580 


britm 


0. 


23483611 


-0, 


09731590 


-0. 


183014065 


0 . 


15942029 


oxford 


0 . 


11686354 


-0. 


04211381 


-0. 


095154883 


0 . 


10789050 


thorpe 


0. 


09239023 


0. 


60867948 


-0. 


096328325 


0. 


32206119 


nathist 


0 . 


46022234 


-0. 


04100992 


0. 


111261162 


0 . 


43317230 


tower 


0. 


25260849 


-0. 


08283769 


-0. 


147741804 


0. 


24315620 


wind 


0 . 


14447895 


0. 


05180584 


-0. 


044192512 


0. 


14975845 


woburn 


0. 


05506417 


0. 


03430597 


-0. 


003405975 


0 . 


08373591 



The item profile for bright, for example, is: 
b 0 =0.338 

bii b 2 , b 3 =0.017, -0.032, -0.412 

A Step - learn about a case profile 

The user has visited the following attractions. 
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> h 

bright chess natgal hampt science whip lego east lonaqu westab kew lonzoo 
00 11 1000 0 00 o 

madamt britm oxford thorpe nathist tower wind woburn 
00 0 0 0000 



This implies a case profile of: 



> (h - Dom.pca$hbar) %*% Dom.pca$b [, 1 : 3] 

PCI PC2 PC3 

-0.2721838 -0.882913 -0.482576 

Y Step - make predictions 

Predicted likelihood for item 1 (i.e. function of user and item profiles) 

> ( (h - Dom.pca$hbar) %*% Dom.pca$b [, 1 : 3] ) %*% t (Dom.pca$b IX 1:3, drop=F] ) + 
Dom.pca$hbar [1] 

bright 
0.561201 

Predicted likelihood for each of the items 

> ( (h - Dom.pca$hbar) %*% Dom.pca$b [ , 1 :3] ) %*% t (Dom.pca$b [ , 1 : 3] ) + 
Dom.pca$hbar 

bright chess natgal hampt science whip lego 

0.561201 0.08642984 0.3945277 0.4090014 0.4994421 0.09550008 -0.1219301 

east lonaqu westab kew lonzoo madamt britm 

0.1481024 0.1754836 0.1660322 0.2165960 0.1323488 0.13.29194 0.2697414 

oxford thorpe nathist tower wind woburn 

0.1591844 -0.1940112 0.2904235 0.3188354 0.08601982 0.04010279 

And a recommendation 

> recomm(((h - Dom.pca$hbar) %*% Dom.pca$b [, 1 : 3] ) %*% t (Dom.pca$b [, 1 -.3] ) + 
Dom.pca$hbar, h) 

$item 
(1] 1 
$P 

(1] 0.561201 
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Example 8 
019 

Example of using the restricted user history for the topping. First get some 
item profiles. 



>iep . jd 




bl 




b2 




b3 




b0 


bright 


0 


. 17916486 


0 


. 02141001 


0 


.107280622 


-0 


.67148568 


chess 


-0 


. 09026066 


-0 


. 10180926 


1 


.121838928 


-0 


•21662415 


natgal 


1 


. 34721208 


0 


.24193703 


-0 


.301708229 


-1 


.44990555 


nampt 


0 


. 80041830 


- 0 


.02416434 


0 


.007786632 


-1 


. 02481696 


science 


0 


. 85536112 


0 


.06506150 


0. 


.101447062 


-0 


. 06765865 


whip 


0 


. 25824137 


1 


.57715976 


0. 


. 147229879 


-1 


.51394915 


lego 


u 




0 


.11150264 


0 , 


.446638983 


-0 


.06765865 


east 


0 


.20630971 


0 


.38297223 


0. 


.093649385 


-2 


.23537634 


lonagu 


0 


.48703898 


0 


.41119215 


-0, 


133891260 


-0 


. 81908402 


westab 


1 


.08441820 


0 


.05499653 


-0. 


.209305366 


-2, 


.25396441 


kew 


1 


.03697579 


0 


.27543851 


0. 


.081300719 


-1. 


.36827586 


lonzoo 


0 


.56361160 


0 


.05135782 


0. 


.467398672 


-0. 


. 02898754 


madamt 


0 


.71878587 


0 


.22708312 


0. 


297627027 


-1. 


.46040233 


britm 


1 


.63067053 


0 


.40388941 


0. 


005839960 


-1. 


.66254774 


oxford 


1 


.35564366 


-0 


.05501297 


0. 


124666452 


-2. 


. 11247207 


thorpe 


-0 


.04584748 


0 


.21426669 


1. 


800349935 


-0.74431547 


nathist 


0 


.82136797 


0 


.25630094 


0. 


077482099 


-0. 


26891980 


tower 


1 


.22543682 


0 


.47203314 


-0. 


005505005 


-1. 


,13545286 


wind 


1 


.01365495 


0. 


.20677286 


0. 


208041754 


-1. 


73649679 


woburn 


-0 


.04385657 


1. 


.80668272 


0. 


152829077 


-2. 


39263672 



Next get the set of observations about the case in question 

> h 

bright chess natgal hampt science whip lego east lonaqu 
00 11 100 00 

westab kew lonzoo madamt britm oxford thorpe nathist tower 
00 00 000 00 

wind woburn 
0 0 

We want to know whether this person is likely to go to Brighton next. So before 
updating knowledge of her profile we replace the first observation 
with a missing. 

> h.l 

bright chess natgal hampt science whip lego east lonaqu 
NA 0 1 1*10 0 0 0 

westab kew lonzoo madamt britm oxford thorpe nathist tower 
0000 000 00 

wind woburn 
0 0 

Now start with the prior distribution over possible user profiles. 

> prior 
$x 





[,1] 


[,2] 


[,3] 


[1,1 


-2 


-2 


-2 


t2j 


-2 


-2 


-1 


[3,] 


-2 


-2 


0 


14,] 


-2 


-2 


1 


[5J 


-2 


-2 


2 


C6J 


-2 


-1 


-2 


[7,] 


-2 


-1 


-1 


[8,] 


-2 


-1 


0 


[9,] 


-2 


-1 


1 
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n n i 


- Z 


- X 


2 


n i l 


~ 2 


n 
U 


-2 


LJL^ / J 


n 
— Z 


u 


-1 


n t i 

/ J 


~ 2 


0 


0 


Ma 1 


- 2 


0 


1 


Me l 


- 2 


0 


2 


M <C 1 

Llo / J 


-2 


1 


-2 


r t *7 i 
LI / / J 


- 2 


1 


-1 


LJ- o , J 


- Z 


1 


0 


M Q 1 


- 2 


1 


1 


L^U, J 


~ 2 


•7 


2 




-2 


2 


-2 


TOO 1 


~ 2 


2 


-1 


1-^-3 / J 


~2 


«-> 
2 


0 


l" ; J 


- 2 


2 


1 


foe 1 


-2 


2 


2 


L^O / J 


-1 


- 2 


-2 


Tin l 
L^ i i J 


-1 


-2 


-1 


Too 1 


-1 


-2 


0 


r*5 o i 
129 , 1 


-1 


-2 


1 


L30, J 


-1 


-2 


2 


L31 , J 


-1 


- 1 


-2 


(32 , J 


-1 


- 1 


-1 


133 , J 


-1 


- 1 


0 




-1 


- 1 


1 


[35, ] 


-1 


-1 


2 


[36, j 


-1 


0 


-2 


[37, J 


-1 


0 


-1 


[38, J 


-1 


0 


0 


[39, ] 


-1 


0 


1 


[40 , ] 


-1 


0 


2 


£41,] 


-1 


1 


-2 


[42, 1 


-1 


1 


-1 


[43, J 


-1 


1 


0 


[44 , ] 


-1 


1 


1 


[45, ] 


-1 


1 


2 


[46, J 


-1 


2 


-2 


[47, J 


-1 


2 


-1 


[48, 3 


-1 


2 


0 


[49, J 


-1 


2 


1 


[50, J 


-1 


2 


2 


[51, J 


0 


-2 


-2 


[52, ] 


0 


-2 


-1 


[53, J 


0 


-2 


0 


[54, ] 


0 


-2 


1 


[55, 3 


0 


-2 


2 


[56, J 


0 


-1 


-2 


frn I 

[57, J 


0 


-1 


-1 


/CO 1 

(58, J 


0 


-1 


0 


fro 1 

[59, J 


0 


-1 


1 


160, J 


0 


- 1 


2 


T H i 1 

161, J 


0 


0 


-2 




ft 
0 


0 


-1 


[63,1 


0 


0 


0 


[64,] 


0 


0 


1 


[65,1 


0 


0 


2 


[66, ) 


0 


1 


-2 


[67,1 


0 


1 


-1 


[68,] 


0 


1 


0 


[69,1 


0 


1 


1 


[70,] 


0 


1 


2 


(71,] 


0 


2 


-2 


[72,] 


0 


2 


-1 
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[78,] 


1 


-2 


0 


[79, ] 


1 


-2 


1 


[80, ] 


1 


-2 


2 


[81,] 


1 


-1 


-2 


[82 , ] 


1 


-1 


-1 


[83, J 
[84, J 


1 


-I 


0 


1 


-1 


1 


[85 , J 


1 


-1 


2 


tor' 1 

[86, J 


1 


0 


-2 


[87 , ] 


1 


0 


-1 


rp q i 

l i i 


i 


A 


0 


[89, ] 
[90, ] 


1 


0 


1 


1 


0 


2 


[91, ] 


1 


1 


-2 


[92, ] 
193,} 


1 


1 


-1 


1 


1 


0 


[94, ] 


1 


1 


1 


[95, ] 


1 


1 


2 


[96, ) 


1 


2 


-2 


[97 , 3 


1 


2 


-1 


[98, ] 


1 


2 


0 


[99, ] 


1 


2 


1 


[100 , ] 


1 


2 


2 


[101, ] 


2 


-2 


-2 


[102 , ] 


2 


-2 


-1 


[103 , ] 


2 


-2 


0 


[104, ] 


2 


-2 


1 


[105, ] 


2 


-2 


2 


[106, ] 


2 


-1 


-2 


[107, ] 


2 


-1 


-1 


[108, ] 


2 


-1 


0 


[109,] 


2 


-1 


1 


[110, ] 


2 


-1 


2 


[111,] 


2 


0 


-2 


[112, ] 


2 


0 


-1 


[113,] 


2 


0 


0 


r -i i a i 
[114, ] 


2 


0 


1 


[115, ] 


2 


0 


2 


[116, ] 


2 


1 


-2 


[117, ] 


2 


1 


-1 


[118, ] 


2 


1 


0 


[119, ] 


2 


1 


1 


[120, ] 


2 


1 


2 


[121, ] 


2 


2 


-2 


[122,] 


2 


2 


-1 


[123,] 


2 


2 


0 


[124,] 


2 


2 


1 


[125,] 


2 


2 


2 



$density 



[lj 0.0002441406 

[6] 0.0009765625 

[11] 0.0014648438 

[16] 0.0009765625 

[21] 0.0002441406 

[26] 0.0009765625 

[31] 0.0039062500 

136] 0.0058593750 

[41] 0.0039062500 

[46] 0.0009765625 

[51] 0.0014648438 

[56] 0.0058593750 

[61] 0.0087890625 

[66] 0.0058593750 

[71] 0.0014648438 

[76] 0.0009765625 

[81] 0.0039062500 

[86] 0.0058593750 



0.0009765625 
0.0039062500 
0.0058593750 
0.0039062500 
0.0009765625 
0.0039062500 
0.0156250000 
0.0234375000 
0.0156250000 
0.0039062500 
0.0058593750 
0.0234375000 
0.0351562500 
0.0234375000 
0.0058593750 
0.0039062500 
0.0156250000 
0.0234375000 



0.0014648438 
0.0058593750 
0.0087890625 
0.0058593750 
0.0014648438 
0.0058593750 
0.0234375000 
0.0351562500 
0.0234375000 
0.0058593750 
0.0087890625 
0.0351562500 
0 .0527343750 
0.0351562500 
0.0087890625 
0.0058593750 
0.0234375000 
0.0351562500 



0.0009765625 
0.0039062500 
0.0058593750 
0. 0039062500 
0, 0009765625 
0. 0039062500 
0. 0156250000 
0. 0234375000 
0. 0156250000 
0. 0039062500 
0. 0058593750 
0. 0234375000 
0. 0351562500 
0. 0234375000 
0. 0058593750 
0.0039062500 
0. 0156250000 
0.0234375000 



0. 0002441406 
0. 0009765625 
0. 0014648438 
0. 0009765625 
0. 0002441406 
0. 0009765625 
0. 0039062500 
0. 005B593750 
0. 0039062500 
0.0009765625 
0. 0014648438 
0. 0058593750 
0. 0087890625 
0.0058593750 
0. 001464 8438 
0. 0009765625 
0. 0039062500 
0.0058593750 
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[91] 
[96] 
[101] 
[106] 
[111] 
[116] 
[121] 



0039062500 
0009765625 
0002441406 
0009765625 
0014648438 
0009765625 
0002441406 



0.0156250000 
0.0039062500 
0.0009765625 
0.0039062500 
0.0058593750 
0 .0039052500 
0 .0009765625 



0.0234375000 
0.0058593750 
0.0014648438 
0.0058593750 
0.0087890625 
0. 0058593750 
0. 0014648438 



0.0156250000 
0.0039062500 
0.0009765625 
0. 0039062500 
0 . 0058593750 
0.0039062500 
0 . 0009765625 



0.0039062500 
0.0009765625 
0.0002441406 
0. 0009765625 
0.0014648438 
0. 0009765625 
0. 0002441406 



Update this in the light of the modified set of observations 



> do.user.dist (h. 1, prior, lep.b$b) 
$x 

$ density 



[1] 


7 


-672890e- 


05 


1 


.635089e- 


04 


7 


.794280e- 


05 


6 


.213913e- 


06 


7 


.011357e- 


08 


[6] 


3 


.490438e- 


04 


7 


.365193e- 


04 


3 


.37l046e- 


04 


2 


.454116e- 


05 


2 


-592575e- 


07 


[11] 


5 


.I27550e- 


04 


1 


.O50861e- 


03 


4 


.500308e- 


04 


2 


.932081e- 


05 


2 


-853203e- 


07 


[16] 


1 


-994830e- 


04 


3 


.748035e- 


04 


1 


.406532e- 


04 


7 


.733731e- 


06 


6 


.548919e- 


08 


[21] 


8 


,5l2749e- 


06 


1 


.395594e- 


05 


4 


.387817e- 


06 


1 


-987583e- 


07 


1 


.447813e- 


09 


[26] 


3 


.243640e- 


03 


6 


.676244e- 


03 


3 


. 055914e- 


03 


2 


.312031e- 


04 


2 


-394440e- 


06 


[31] 


1 


. 316148e- 


02 


2 


.676985e- 


02 


1 


.173815e- 


02 


8 


.080382e- 


04 


7 


. 789287e- 


06 


[36] 


1 


-647478e- 


02 


3 


.243054e- 


02 


1 


.324935e- 


02 


8 


.112264e- 


04 


7 


. 144813e- 


06 


[41] 


5 


. 058183e- 


03 


9 


.079432e- 


03 


3 


,231540e- 


03 


1 


.656939e- 


04 


1 


.259165e- 


06 


[46] 


1 


.585460e- 


04 


2 


.482305e- 


04 


7 


.398349e- 


05 


3 


.118099e- 


06 


2 


.033852e- 


08 


[51] 


2 


.317040e- 


02 


4 


.560712e- 


02 


1 


.967198e- 


02 


1 


-381112e- 


03 


1 


.282665e- 


05 


[56] 


7 


. 349274e- 


02 


1 


.429598e- 


01 


5 


. 9O4360e- 


02 


3 


. 764450e- 


03 


3 


.239402e- 


05 


[61] 


6 


. 641006e- 


02 


1 


.247488e- 


01 


4 


.787870e- 


02 


2 


.703213e- 


03 


2 


. 111866e- 


05 


[66] 


1 


. 317223e- 


02 


2 


.247279e- 


02 


7 


.489297e- 


03 


3 


.526124e- 


04 


2 


.366655e- 


06 


[71] 


2 


.452715e- 


04 


3 


.653819e- 


04 


1 


.022277e- 


04 


3 


.964182e- 


06 


2 


. 290390e- 


08 


[76] 


1 


.318247e- 


02 


2 


.483070e- 


02 


1 


.008892e- 


02 


6 


.572711e- 


04 


5 


.467797e- 


06 


[81] 


2 


. 589950e- 


02 


4 


.807970e- 


02 


1 


. 871111e- 


02 


1 


.109051e- 


03 


8 


.560060e- 


06 


[86] 


1 


•320545e- 


02 


2 


.349219e- 


02 


8 


.465754e- 


03 


4 


.438305e- 


04 


3 


.110557e- 


06 


[91] 


1 


.341369e- 


03 


2 


.145120e- 


03 


6 


. 683755e- 


04 


2 


.922139e- 


05 


1 


. 767116e- 


07 


[96] 


1 


.250612e- 


05 


1 


.736093e- 


05 


4 


.543827e- 


06 


1 


.644732e- 


07 


8 


. 654834e- 


10 


[101] 


1 


.561765e- 


04 


2 


.734836e- 


04 


1 


. 044944e- 


04 


6 


.471019e- 


06 


5 


. 038589e- 


08 


[106] 


1 


. 647185e- 


04 


2 


.803943e- 


04 


1 


.018283e- 


04 


5 


.727670e- 


06 


4 


. 150223e- 


08 


[111] 


4 


.446394e- 


05 


7 


.130991e- 


05 


2 


.371643e- 


05 


1 


•173679e- 


06 


7 


.724482e- 


09 


[116] 


2 


.383790e- 


06 


3 


.386293e- 


06 


9 


.657758e- 


07 


3 


. 976672e- 


08 


2 


.268751e- 


10 


[121] 


1 


.265075e- 


08 


1 


.549984e- 


08 


3 


.708606e- 


09 


1 


.267636e- 


10 


6 


•344982e- 


13 



Get the predicted likelihood of visiting the first attraction 

>do.pred(lep.b, h.l, 1, prior) 
[1] 0.312789 

Repeat this for each attraction, recalculating the posterior each time. This 
gives : 

>mh(lep.b, h, 1:20, prior) 

[1] 0.31278903 0.27180617 0.16427276 0.24566550 0.41710747 0.12806525 
[7] 0.36447443 0.07352558 0.29817359 0.13808571 0.19315128 0.39286417 

[13] 0.14204873 0.18939037 0.13652884 0.13132923 0.40522199 0.24230986 

[19] 0.13127001 0.06436074 



And a recommendation 

>recomm(mh(lep.b, h, 1:20, prior), h) 

$item 

(1]17 

$P 

[1] 0.405222 
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Example 9 

DATE: 6/26/2001 
TIME: 15:06 

L I S R E L 8.30 
BY 

Karl G. J"reskog & Dag S"rbom 



This program is published exclusively by 
Scientific Software International, Inc. 
73 83 N. Lincoln Avenue, Suite 100 
Lincolnwood, IL 60712, U.S.A. 
Phone: (800)247-6113, (847)675-0720, Fax: (847)675-2140 
Copyright by Scientific Software International, Inc., 1981-2000 
Use of this program is subject to the terms specified in the 
Universal Copyright Convention. 
Website: www.ssicentral.com 

The following lines were read from file C:\WINDOWS\DESKTOP\LISREIi\l006\LA3.LPJ: 

This example uses prior knowledge about the attractions in order to build a 
model which may be more readily interpreted. We have defined 5 characteristics 
that people may value when choosing an attraction 

SW fringes 
Beach 
Museum 
Animals 

Adventure park 

We then assumed a latent trait for each characteristic, and fixed the loading 
to be 0 for those attractions we considered did not indicate that trait. 

We added 2 further latent traits, one each for oxford and madame Tussauds. We 
did not consider that either indicated any of the other characteristics. For 
these two, only one loading is free - on oxford for oxford, and on Madame 
Tussauds for Madame Tussauds. To prevent estimation problems we fixed the value 
of the unique variance to be 0.3 for both attractions. 

DA NI=21 NO=624 MA=PM 
Labels ; 

BRIGHT CHESS NATGAL HAMPTON SCIENCE WHIP LEGO EAST LAQUA WABBEY KEW LZOO 
MTUSS BRITM OXFORD THORPE NATHIST TOWER WINDSOR WOBURN OLDKID 

PM Fl = LAkids.cma 
AC Fl - LAkids.acc 

SE 

BRIGHT CHESS NATGAL HAMPTON SCIENCE WHIP LEGO EAST LAQUA WABBEY KEW LZOO 
MTUSS BRITM OXFORD THORPE 
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NATHIST TOWER WINDSOR WOBURN / 



MO NX=2 0 NK=7 TD=DI 



PA LX 
* 



0 


1 


0 


0 


0 


0 


0 ! 


Brighton 


1 


0 


0 


0 


1 


0 


0 ! 


Ches s ing ton 


0 


0 


1 


0 


0 


0 


0! 


National Gallery 


1 


0 


0 


0 


0 


0 


0! 


Hampton Court Gardens 


0 


0 


1 


0 


0 


0 


0! 


Science Museum 


0 


0 


0 


1 


0 


0 


0! 


Whipsnade 


1 


0 


0 


0 


0 


0 


0! 


Lego Land 


0 


1 


0 


0 


0 


0 


0! 


Eastbourne 


0 


0 


0 


1 


0 


0 


0) 


London Aquarium 


0 


0 


1 


0 


0 


0 


0! 


Westminster Abbey 


1 


0 


0 


0 


0 


0 


0! 


Kew 


0 


0 


0 


1 


0 


0 


0! 


London Zoo 


0 


0 


0 


0 


0 


0 


1! 


Madam Tussauds 


0 


0 


1 


0 


0 


0 


0! 


British Museum 


0 


0 


0 


0 


0 


1 


0! 


Oxford 


1 


0 


0 


0 


1 


0 


0! 


Thorpe Park 


0 


0 


1 


1 


0 


0 


0! 


Natural History Museum 


0 


0 


1 


0 


0 


0 


0! 


Tower of London 


1 


0 


0 


0 


0 


0 


0! 


Windsor Castle 


0 


0 


0 


1 


0 


0 


0! 


Woburn 



PA PH 
* 

1 



111 

1111 

1110 1 

111111 

1111111 

! 00000001 

! 000000001 



PA TD 
* 

1 
1 
1 
1 
1 
1 
1 
1 
1 
1 
1 
0 

1 

0 

1. 
1 
1 
1 
1 



SUBSTITUTE SHEET (RULE 26) 



WO 02/10954 



PCT/GB01/03383 



- 224 - 

VA 0.3 TD(15,15) 10(13,13) 
!Path diagram 
OU AD = 200 SE MI 

This example uses prior knowledge about the attractions in order to build a mod 

Number of Input Variables 21 

Number of Y - Variables 0 

Number of X - Variables 20 

Number of ETA - Variables 0 

Number of KSI - Variables 7 

Number of Observations 624 



This example uses prior knowledge about the attractions in order to build a mod 
Correlation Matrix to be Analyzed 





BRIGHT 






HAMPT 


BRIGHT 


1 


.00 








CHESS 


0 


.03 


1 fin 






NATGAL 


0 


.16 


- fi m 


t An 




HAMPTON 


0 


.24 




a o o 


1 . 00 


SCIENCE 


0 


.04 


_ n a q 


0.38 


0 . 23 


WHIP 


0 


.00 


n a *5 


0 . 14 


0 . 04 


LEGO 


-0 


.10 


u . U J 


-0 . 12 


0 . 03 


EAST 


0 


.17 


A A Q 


0.09 


0 . 00 


LAQUA 


0 


.01 


-0.10 


0 . 06 




WABBEY 


0 


.08 


-0 .03 


0.37 


0.15 


KEW 


-0 


.01 


0.01 


0.34 


0.37 


LZOO 


0 


.02 


-0.08 


0.03 


0.05 


MTUSS 


-0 


.01 


0.21 


0.16 


0.00 


BRITM 


0 


.10 


-0.02 


0.51 


0 .22 


OXFORD 


0 


.05 


-0.11 


0.31 


0 .23 


THORPE 


0 


.05 


0.51 


-0.14 


-0 .01 


NATHIST 


-0 


.12 


-0.02 


0.25 


0.10 


TOWER 


-0 


.01 


-0.10 


0.18 


0.17 


WINDSOR 


-0 


.05 


0.01 


0.19 


0.30 


WOBURN 


0 


.01 


-0.01 


0.08 


-0 .02 


Correlation 


Matrix to 


be Analyzed 






LEGO 


1, 


.00 








EAST 


-0, 


.24 


1.00 






LAQUA 


0. 


,17 


-0.09 


1.00 




WABBEY 


-0. 


,08 


0.24 


0.10 


1 .00 


KEW 


0. 


10 


-0.04 


0.19 


0 .18 


LZOO 


•0. 


,19 


-0.01 


0.28 


0 .12 


MTUSS 


-0. 


01 


0.09 


0.04 


0.24 


BRITM 


-0. 


11 


0.09 


0.23 


0.31 


OXFORD 


-0. 


01 


0 . 03 


0.15 


0 .34 


THORPE 


0. 


24 


0.07 


-0.05 


-0 .11 


NATHIST 


0. 


15 


0.07 


0.26 


0.08 


TOWER 


0 . 


00 


0 .25 


0.15 


0.40 


WINDSOR 


0. 


33 


0.04 


0.12 


0.34 


WOBURN 


0. 


08 


0.20 


0.20 


0.00 



SCIENCE 



1.00 
0.09 
0.09 

-0.02 
0.21 

-0.01 
0.26 
0.23 
0.08 



WHIP 



35 
12 
04 
48 
17 
02 



0.02 



1. 00 
0.23 
0.11 
0.25 
0.43 
0. 04 
0.16 
0.23 
0.25 
0.12 



1.00 
0.12 
0.17 
0. 05 
-0. 02 
0. 05 
0.12 
0. 09 



08 
15 
13 
22 
25 
18 
65 



1. 00 
0.15 
0.17 
0.11 



23 
26 
14 
14 



0. 04 
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Correlation Matrix to be Analyzed 





MTUSS 


BRITM 


OXFORD 


THORPE 


NATHIST 


MTUSS 


1.00 










BRITM 


0.27 


1.00 








OXFORD 


0 .24 


0 .36 


1.00 






THORPE 


0.15 


0 . 00 


0.03 


1.00 




NATHIST 


0 .23 


0.39 


0.09 


0 .04 


1.00 


TOWER 


0 .34 


0.42 


0.31 


0.01 


0,23 


WINDSOR 


0.24 


0.23 


0.43 


0.10 


0. 05 


WOBURN 


0 .12 


0.15 


-0.04 


0 .20 


0.12 



TOWER 



1.00 
0.42 
0.19 



WINDSOR 
WOBURN 



Correlation Matrix to be Analyzed 
WINDSOR WOBURN 



1.00 
0.09 



1.00 



This example uses prior knowledge about the attractions in order to build a mod 
Parameter Specifications 
LAMBDA- X 

KSI 6 

0 
0 
0 
0 
0 
0 
0 
0 
0 
0 
0 
0 
0 
0 

16 
0 
0 
0 
0 
0 





KSI 1 


KSI 2 


KSI 3 


KSI 4 


KSI 


BRIGHT 


0 


1 


0 


0 


0 


CHESS 


2 


0 


0 


0 


3 


NATGAL 


0 


0 


4 


0 


0 


HAMPTON 


5 


0 


0 


0 


0 


SCIENCE 


0 


0 


6 


0 


0 


WHIP 


0 


0 


0 


7 


0 


LEGO 


8 


0 


0 


0 


0 


EAST 


0 


9 


0 


0 


0 


LAQUA 


0 


0 


0 


10 


0 


WABBEY 


0 


0 


11 


0 


0 


KEW 


12 


0 


0 


0 


0 


LZOO 


0 


0 


0 


13 


0 


MTUSS 


0 


0 


0 


0 


0 


BRITM 


0 


0 


15 


0 


0 


OXFORD 


0 


0 


0 


0 


0 


THORPE 


17 


0 


0 


0 


18 


NATHIST 


0 


0 


19 


20 


0 


TOWER 


0 


0 


21 


0 


0 


WINDSOR 


22 


0 


0 


0 


0 


WOBURN 


0 


0 


0 


23 


0 




LAMBDA- X 












KSI 7 










BRIGHT 


0 










CHESS 


0 










NATGAL 


0 










HAMPTON 


0 










SCIENCE 


0 










WHIP 


0 










LEGO 


0 
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EAST 


0 


LAQUA 


0 


WABBEY 


0 


KEW 


0 


LZOO 


0 


MTUSS 


14 


BRITM 


0 


OXFORD 


0 


THORPE 


0 


NATHIST 


0 


TOWER 


0 


WINDSOR 


0 


WOBURN 


0 



PHI 







KSI 1 


KSI 2 


KSI 3 


KSI 4 


KSI 


KSI 


1 


0 










KSI 


2 


24 


0 








KSI 


3 


25 


26 


0 






KSI 


4 


27 


28 


29 


0 




KSI 


5 


30 


31 


32 


0 


0 


KSI 


6 


33 


34 


35 


36 


37 


KSI 


7 


38 


39 


40 


41 


42 




PHI 











KSI 6 



KSI 7 
KSI 7 0 

THETA -DELTA 
BRIGHT 
44 

THETA- DELTA 
LEGO 
50 

THETA- DELTA 
MTUSS 
0 

THETA-DELTA 
WINDSOR 
60 



CHESS 
45 

EAST 
51 

BRITM 
56 

WOBURN 
62 



NATGAL 
46 

LAQUA 
52 

OXFORD 
0 



HAMPTON 
47 

WABBEY 
53 

THORPE 
57 



SCIENCE 
48 

KEW 
54 

NATHIST 
58 



0 
43 



WHIP 
49 

LZOO 
55 

TOWER 
59 



This example uses prior knowledge about the attractions in order to build a mod 
Number of Iterations =35 
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LISREL Estimates (Weighted Least Squares) 
LAMBDA -X 

KSI 1 KSI 2 KSI 3 KSI 4 KSI 5 KSI 6 



BRIGHT - - 0.41 

(0 .06) 
6,55 

CHESS 0.14 - - - - - - 0.96 

(0.11) • (0.17) 

1.31 5.78 
NATGAL - - - - 0.79 

(0.04) 
21.01 

HAMPTON 0.66 - - - - - - 

(0.05) 
14.63 

SCIENCE - - - - 0.60 

(0.03) 
19.43 

WHIP - - - - - - 0.74 

(0.04) 
18.64 

LEGO 0.36 - - - - - - - - 

(0.04) 
9.01 

EAST - - 0.75 

(0.11) 
7.04 

LAQUA - - - - - - 0.53 

(0.05) 
10.99 

WABBEY - - - - 0.52 

(0.05) 
9.78 

KEW 0.75 - - - - -- 

(0.05) 
15.33 

LZOO - - -- - - 0.40 

(0.04) 
9.80 

MTUSS -- -- - - - - - - 

BRITM - - 0.82 - - 

(0.04) 
18.84 
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OXFORD 



THORPE 



NATHIST 



TOWER 



WINDSOR 



WOBURN 



0.19 
(0.08) 
2.28 



0. 84 
(0.02) 
34.94 



0.62 
(0.11) 
5.58 



0.63 
(0.08) 
7 .99 
0.68 
(0 . 04) 
18.51 



-0.03 

(0.09) 

-0.37 



0.74 
(0.05) 
13.75 



0.96 
(0.06) 
16.12 



LAMBDA -X 

KSI 7 



BRIGHT 
CHESS 
NATGAL 
HAMPTON 
SCIENCE 
WHIP 
LEGO 
EAST 
LAQUA 
WABBEY 
KEW 
LZOO 
MTUSS 

BRITM 
OXFORD 



0.84 
(0.02) 
34.94 
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THORPE - - 

NATHIST - - 

TOWER - - 

WINDSOR - - 

WOBURN - - 

PHI 

KSI 1 KSI 2 KSI 3 KSI 4 KSI 5 KSI 6 

KSI 1 1.00 

KSI 2 0.43 1.00 
(0.10) 
4.46 

KSI 3 0.65 0 .56 1.00 . 

(0.05) (0.10) 

14.34 5.73 

KSI 4 0.49 0.63 0.65 1.00 

(0.06) (0.10) (0.05) 

8.20 6.14 13.30 

KSI 5 0.15 0.15 -0.04 - - 1.00 

(0.12) (0.09) (0.08) 

1.27 1.60 -0.55 

KSI 6 0.62 0.20 0.42 0.19 0.00 1.00 

(0.07) (0.12) (0.07) (0.09) (0.10) 

8.85 1.71 6.13 2.17 0.03 

KSI 7 0.43 0.50 0.67 0.50 0.23 0.30 

(0.07) (0.12) (0.06) (0.07) (0.08) (0.09) 

5.76 4.10 10.89 7.04 2.84 3.18 



PHI 



KSI 7 



KSI 7 1.00 

THETA- DELTA 

BRIGHT 

0 . 84 
(0.06) 
13.01 

THETA- DELTA 

LEGO 



CHESS 

0.03 
(0.31) 
0.10 



EAST 



NATGAL HAMPTON SCIENCE 



0.37 
(0.07) 
5.21 



0.56 
(0.07) 
7.85 



LAQUA WABBEY 



0.64 
(0.05) 
11.69 



KEW 



WHIP 

0.45 
(0.07) 
6.27 



LZOO 
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0. 87 
(0.05) 
17.59 

THETA - DELTA 

MTUSS 

0.30 

THETA -DELTA 

WINDSOR 

0.45 
(0.09) 
4. 97 



0.44 
(0.16) 
2.66 



BRITM 



0.33 
(0.08) 

-x . X u 



WOBURN 

0.08 
(0.12) 
0.65 



0.72 
(0.06) 
11.16 



OXFORD 
0.30 



0.73 
(0.07) 
10.79 



0.43 
(0.08) 
5.12 



THORPE NATHIST 



0.54 
(0.14) 
3 . 76 



0.62 
(0.06) 
10 . 50 



0.84 
(0.05) 
16.35 



TOWER 

0.53 
(0.06) 
8.30 



WHIP 
0.55 

LZOO 
0.16 

TOWER 
0.47 



Squared Multiple Correlations for X - Variables 

BRIGHT CHESS NATGAL HAMPTON SCIENCE 
0.16 0.97 0.63 0.44 0.36 

Squared Multiple Correlations for X - Variables 

LEGO EAST LAQUA WABBEY KEW 

0.13 0.56 0.28 0.27 0.57 

Squared Multiple Correlations for X - Variables 

MTUSS BRITM OXFORD THORPE MATHIST 
0.70 0.67 0.70 0.46 0.38 

Squared Multiple Correlations for X - Variables 
WINDSOR WOBURN 
0.55 0.92 

Goodness of Fit Statistics 



Degrees of Freedom =14 9 
Minimum Fit Function Chi-Square = 381.65 (P = 0.0) 
Estimated Non-centrality Parameter (NCP) = 232.65 
90 Percent Confidence Interval for NCP = (178.79 ; 294.19) 

Minimum Fit Function Value = 0.61 
Population Discrepancy Function Value (F0) = 0.37 
90 Percent Confidence Interval for F0 = (0.29 ; 0.47) 
Root Mean Square Error of Approximation (RMSEA) = 0.050 
90 Percent Confidence Interval for RMSEA = (0.044- ; 0.056) 
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P-Value for Test of Close Fit (RMSEA < 0.05) = 0.48 

Expected Cross -Validation Index (ECVI) = 0.81 
90 Percent Confidence Interval for ECVI = (0.72 ; 0.91) 
ECVI for Saturated Model =0.67 
ECVI for Indepence Model =3.01 

Chi-Square for Independence Model with 190 Degrees of Freedom = 1837.13 

Independence AIC = 1877.13 
Model AIC - 503.65 
Saturated AIC « 42 0.00 
Independence CAIC = 1985.85 
Model CAIC - 835.25 
Saturated CAIC = 1561.59 



Wormed Fit Index (NFI) = 0.79 
Non-Normed Fit Index (NNFI) = 0.82 
Parsimony Formed Fit Index (PNFI) = 0.62 
Comparative Fit Index (CFI) = 0.86 
Incremental Fit Index (IFI) = 0.86 
Relative Fit Index (RFI) = 0.74 



Critical N (CN) = 314.54 



Root Mean Square Residual (RMR) = 0.16 
Standardized RMR = 0.16 
Goodness of Fit Index (GFI) = 0.97 
Adjusted Goodness of Fit Index (AGFI) =0.96 
Parsimony Goodness of Fit Index (PGFI) = 0.69 

This example uses prior knowledge about the attractions in order to build a mod 
Modification Indices and Expected Change 



Modification Indices for LAMBDA -X 

KSI 1 KSI 2 KSI 3 KSI 4 KSI 5 KSI 6 
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0.33 
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0.20 
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0.49 


2. 02 


TOWER 
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0 , 
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0.07 


1.68 
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5. 


,20 


11.17 


2.72 


0.43 


2. 77 
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9.80 


0 , 


.03 


29.98 




0.38 


17.27 
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KSI 7 



BRIGHT 

CHESS 

NATGAL 

HAMPTON 

SCIENCE 

WHIP 

LEGO 



0.27 
1.07 
0.51 
6.20 
9.54 
2.24 
7.32 



LAQUA 

WABBEY 

KEW 

LZOO 

MTUSS 

BRITM 

OXFORD 

THORPE 

NATHIST 

TOWER 

WINDSOR 

WOBURN 



0.33 
0.58 
0.08 
13.18 

0. 01 

1.07 

9.13 

0.23 

14.42 

0.94 

Expected Change for LAMBDA -X 





KSI 1 


KSI 2 


KSI 3 


KSI 4 


KSI 5 


KSI 6 
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-0.03 
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-0.05 


0.33 


LZOO 


0.31 


0.26 


0.40 




0.05 
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0.04 
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-0.02 
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-0.04 






0.04 


-0.09 


TOWER 


0.22 


0.08 




0.13 


0.01 


0.11 


WINDSOR 




0.21 


0.29 


0.14 


-0.04 


-0.20 


WOBURN 


-0 .31 


0.03 


-0.75 




0.04 


-0.42 



Expected Change for LAMBDA -X 
KSI 7 



BRIGHT 0.07 

CHESS 0 . 13 

NATGAL -0.08 

HAMPTON -0.20 

SCIENCE -0.32 

WHIP -0.16 

LEGO -0.21 

EAST -0.14 
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LAQUA 


0 


.06 


W ABBEY 


0 


. 10 


KEW 


0 


. 03 
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0 


.33 
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-0 


.01 
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THORPE 


-0 


.08 


NATHIST 


0 


.33 
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0 
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.34 


WOBURN 


-0 


.12 



No Non-Zero Modification Indices for PHI 

Modification Indices for THETA- DELTA 
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0.09 
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WINDSOR WOBURN 



WINDSOR - - 

WOBURN 6.98 



Expected Change for THETA- DELTA 
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Expected Change for THETA- DELTA 
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Expected Change for THETA - DELTA 
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Expected Change for THETA - DELTA 

WINDSOR WOBURN 
WINDSOR - - 

WOBURN -0.22 - - 



Maximum Modification Index is 51.51 for Element (20, 6) of THETA- DELTA 

The Problem used 297584 Bytes (= 0.4% of Available Workspace) 
Time used: 12.910 Seconds 
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0 0 1110 0 

0 10 110 0 

1 1 0 0 0 1 1 
0 0 0 0 0 0 1 
0 0 0 0 1 0 1 
0 0 0 .1 0 0 0 
0 11110 0 

0 0 0 0 0 0 1 
110 10 0 0 

1 0 1 0 0 0 0 
110 0 10 0 
110 0 10 0 

0 0 0 0 0 0 1 

1 0 1 0 0 0 0 

0 0 0 0 1 0 0 

1 0 0 0 0 1 0 
0 0 0 0 1 0 0 
0 0 1110 1 
0 10 0 10 0 
0 10 0 10 1 
0 10 0 10 1 

0 0 0 0 1 0 0 

1 0 0 0 1 1 0 
10 10 10 0 
1 1 0 0 0 1 0 
0 10 1110 
0 0 0 0 1 0 1 
0 1 0 0 0 0 1 
0 0 10 111 
10 110 0 0 
0 0 0 0 0 0 0 
0 10 10 0 1 
0 10 10 0 1 
0 10 0 110 
110 0 10 0 
0 11110 1 
0 1 0 0 0 1 0 

0 10 10 10 

1 1 0 0 0 0 1 



The data 

0 0 0 0 0 0 0 

0 0 0 0 0 0 0 

0 0 0 0 0 0 0 

0 0 0 0 1 0 0 

0 0 0 1 0 0 0 

0 0 0 0 0 0 1 

0 0 0 0 0 0 0 

0 1 0 0 0 0 0 

0 0 0 0 0 0 0 

0 0 1 0 0 0 0 

0 0 0 0 0 0 0 

0 0 0 0 0 0 0 

0 0 0 0 1 0 0 

0 0 0 1 0 0 0 

0 0 0 0 0 0 0 

0 0 0 0 1 0 0 

0 0 0 0 0 1 0 

0 0 0 1 0 0 0 

0 0 0 0 0 0 0 

0 0 10 10 0 

0 1 0 0 0 1 0 

0 0 0 0 0 0 1 

0 1 0 0 0 0 0 

1 1 1 0 0 0 1 

0 0 0 0 1 1 0 

1 0 0 0 0 0 0 
0 0 0 1 0 1 0 
0 0 0 0 0 1 0 
0 10 0 10 0 
0 0 0 1 0 0 0 
0 10 0 10 0 

0 0 1 0 0 0 0 

1 0 0 0 0 0 0 
0 0 0 0 1 1 0 
0 0 0 0 1 0 0 
10 0 110 0 
0 1 0 0 0 0 0 

ooooooo 

110 0 10 0 
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ooooooo 
ooooooo 
ooooooo 

0 0 1 0 0 0 0 

ooooooo 

1 0 0 0 0 0 0 
0 0 0 0 0 0 1 
0 0 1 0 0 0 0 

ooooooo 
ooooooo 
ooooooo 

0 0 0 0 0 0 1 

0 0 1 0 0 0 1 

0 0 0 0 0 0 1 

0 0 1 0 0 0 0 

ooooooo 

0 0 1 0 0 0 1 

ooooooo 

0 1 1 0 0 0 0 
0 11110 0 

ooooooo 

0 0 1 0 0 0 1 

0 0 1 0 0 0 0 

10 110 0 1 

0 0 0 1 1 0 1 

0 1 1 0 0 0 1 

1 0 1 0 0 0 0 
0 0 0 0 0 0 1 
0 0 1 0 0 0 0 

ooooooo 
ooooooo 
ooooooo 
ooooooo 

0 0 1 0 0 0 0 

0 1 1 0 0 0 0 

0 1 1 0 0 0 0 

1 0 0 0 0 0 1 
0 0 10 10 1 
0 10 10 0 0 
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1111000000 
0100100010 
0 1111110 10 
0 0 1 0 0 0 1 1 0 1 
1100001111 
0000001100 
1000000001 
1000000100 
0110100000 
0000000000 

oooooooooo 

0100000000 
0000000010 
0100 0 01000 
1110000000 
0100001000 

oooooooooo 

0000100100 
0000001010 
0000101000 
0000100001 
0000000010 
1000101000 
0101000010 
000 1000000 
0100001000 
0100010000 
0000100000 
1000001000 
0000010010 
0 100101010 
1110000000 
1001000100 
0100111000 
1100000000 
0001001010 
0100000000 
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00000000000 
10001100001 

0 1110 11110 1 
01001001100 
01011000000 
00000011001 
00000010000 
00000001100 
00010010010 

01 0 ( 00011000 
00 1 00011000 
01000100001 
00100010000 
11000000000 
00000000001 
00000010000 
01100010000 
00000011000 

ooooooioooo 

00000010000 
00000001001 
11000000000 
00000001000 
00000100000 
01000100000 
00100000001 
00000100001 
10000001001 
01000000000 
01000000000 
00000100001 
00000000000 
00000000000 
00000001000 
OOOOOOIOOOO 
00000000001 
00100001001 
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101000000000010000000 

000000001011000000000 

100100000100000000000 

110000000000000100000 

110100000000000000001 

100001000000000010000 

010000000001000010001 

010010000000000100000 

010000000001000100000 

110100000000 0, 00000000 

000000101001 0 00000000 

001010100000000000000 

010000100000000100001 

011000000000010000000 

000010101000000000000 

000011100000000000000 

100000100000000000010 

01000000000 0 000010101 

110000100000000000001 

100010100000000000000 

000010101000000000000 

100010000000000010001 

000100100001000000000 

000110000000010000000 

010010100000000000000 

111000000000000000000 

100000110000000000000 

100000000000010001000 

110000001000000000000 

100000100000000000000 

000000101100000010000 

000010101000000000000 

000010100001000000000 

100100000000000001000 

001000000010001000000 

OOOOIOOOOOIIOOOOOOOOO 

000111001011000000011 
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100110000000000000000 

001000100010000000000 

000000100010000100000 

010010000000100000001 

100110000000000000000 

000101100000000000000 

010000010000100000000 

000000001000010000100 

110100000000 0, 00000000 

110011100111 1 00111101 

100100000001000000000 

000010100011000000000 

000010000001000010000 

101011101000110011010 

010000001011000010101 

000000100001100101001 

111110111111110111101 

110110101011101111000 

100000000000101100000 

1 0 0 0 1 1 1 0 1 0 0 1 0 0 0 0 0 0 1 1 0 

000010100001111101101 

110110010011101011000 

010001001000000100011 

1 1 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 1 

110010000001010010000 

000001000000100101001 

111111101111110111101 

010101000000001100000 

010O001OO0OOO001OOOO1 

100100000001000000000 

110000010000000000001 

000000100100000001000 

010000000001100000000 

1 1 0 0 0 0 1 0*0 000000100001 

010010100000000000001 

100000100001000000001 
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010010100011000000000 

000100000000000010101 

000000000001000100010 

000010001000000010000 

000010100001000000000 

001010001110010010000 

OOOGGOOOOGG0110010000 

110010000000000000000 

101111000000000000000 

100010000001000000000 

010000000100 I 00000001 

000110000000000100000 

101110000000000000000 

000010100001000000001 

010000100000000100000 

111010000000000000000 

100010100000000000001 

110100000000000000001 

010001010000000000000 

000110100011000011000 

000000000001001000000 

000110100010000000100 

001011100000000110010 

110010101001010100001 

100001001001000011000 

010010000000000011001 

001000001010001000010 

110010100000000010100 

000011101111000011000 

000000000001110001000 

101011001001000010011 

001001100100100010001 

011110100111111111101 

000010101011000010000 
010110100010000000001 
111111101011010111001 
001011010001101110000 
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0100001000 
0100100000 
0000100000 
0010000000 
0010100000 
0000001010 
0000100010 
1001001000 
1101100000 
0100101000 
1101101000 
0010100000 
0011101010 
0101101000 
1000101000 

ooooooioio 

1101100100 
0000000010 
0111101000 
0010100000 
0000111010 
0000001000 
0000010000 
0000011000 
000O111000 
010O111000 
0111111110 
100011 1010 
0100001000 
1001101010 
1100000010 
0101001000 
1111101000 
1110100000 
0000001000 
0111101000 
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11000101011 
01100100000 
00000011001 
01000001000 
01000000100 
01010010000 
0100 0. 100000 
01000010100 
01000110 0,0 0 
00100100001 
00 0^ 00100000 
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Claims 

1. A method of filtering data to predict an 
observation about an item for a particular case, in 
5 which: a set of data representing actual observations 

about a plurality of items for a plurality of different 
cases is modelled as a function of a plurality of case 
and item profiles, each profile being a set of 
parameters comprising at least one hidden metrical 
10 variable, the parameters defining characteristics of the 
respective cas,e or item; 

a best fit of the function to the data is 
approximated in order to find the values of the item 
profiles; and 

15 the profiles found are used together with the 

function to predict an observation for a particular case 
about one or more items for which data is not available 
for that case . 

2 0 2. A method as claimed in claim 1, wherein the 

function which models the data set comprises a plurality 
of models, each model representing the observations 
about one item for the cases in the data set . 

25 3. A method as claimed in claim 1 or 2, wherein each 
model is derived by identifying a model type which 
approximates the closest fit to the data available for 
the item in question. 

3 0 4. A method as claimed in claim 1, 2 or 3 , wherein in 

the function which models the data set, the observations 
about items for cases are independent, conditional on 
the case profiles. 

35 5. A method as claimed in any preceding claim, wherein 
the models which make up the function are learnt from 
past observations. 
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6. A method as claimed in any preceding claim, wherein 
point estimates of the parameters of the case and item 
profiles are found for the dataset and these are used to 
predict an observation. 

5 

7. A method of filtering data to predict an 
observation about an item for a particular case, in 
which a set of data is obtained representing actual 
observations for a plurality of cases, including the 

10 particular case, about a plurality of items, a function 
which models tjie data set is solved so that the data is 
decomposed into a plurality of case profiles and item 
profiles, and an observation for the particular case 
about an item is predicted using the case profiles and 

15 item profiles obtained. 



8. A method as claimed in claim 6 or 7, wherein the 
function is maximised so as to determine the case and 
item profiles . 

9. A method as claimed in claim 8, wherein the data 
set is modelled as a function of the likelihood of the 
data in the data set being present and the function is 
solved by choosing item profiles and case profiles which 
maximise the likelihood of the data in the data set 
being present. 



10. A method as claimed in claim 8 or 9, wherein the 
function is maximised iteratively such that one of the 
case and item profiles is held constant during each step 
of an iteration. 

11. A method as claimed in any of claims 1 to 5, 
wherein the function which models the dataset is a 
function of a prior distribution over possible case 
profiles and point estimates of the item profiles are 
then obtained. 
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12. A method of filtering data to predict an 
observation about an item for a particular case, in 
which a set of data is obtained representing actual' 
observations for a plurality of cases about a plurality 
of items, a function which models the data set as a 
function of a plurality of item profiles and a prior 
distribution over a plurality of possible case profiles 
is set up to provide point estimates of the item 
profiles that fit the function to the data, and an 
observation about an item for a particular case is 
predicted using the item profile point estimates 
obtained together with a set of data representing 
observations about a plurality of items for the said 
particular case. 

13. A method as claimed in any preceding claim, wherein 
the observation is predicted by updating a prior 
distribution over possible case profiles using Bayesian 
inference. 



14 . A method of filtering data to predict an 
observation about an item for a particular case, in 
which a set of data representing actual observations for 
a plurality of cases about a plurality of items is 
25 modelled by a function, and the function is solved so as 
to decompose the data into a plurality of case profiles 
and a plurality of item profiles, and an observation for 
the particular case about an item is predicted by 
Bayesian inference using the case profiles and item 
profiles obtained together with a set of data 
representing observations about a plurality of items for 
the said particular case. 



15. A method as claimed in claim 14, wherein the case 
profiles obtained are used to obtain a prior probability 
distribution over possible case profiles for the said 
particular case and the prior probability distribution 
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is then used in the Bayesian inference. 

16. A method as claimed in claim 15, wherein the prior 
probability distribution is generated by taking an 
average of the case profiles in the data set. 

17. A method as claimed in claim 16, wherein a 
posterior probability distribution over possible case 
profiles for the said particular case is generated from 
the prior probability distribution by Bayesian inference 
using the set pf data relating to the said case and the 
function modelling the likelihood of the data set being 
present . 

15 18. A method as claimed in claim 17, wherein the 

posterior probability distribution is used to generate a 
probability distribution over possible observations 
about items for the particular case. 

20 19. A method as claimed in any of claims 13 to 18, 

wherein only the data relating to those items for which 
observations have been obtained for the case is used in 
updating the prior distribution over possible case 
profiles. 



25 



30 



35 



20. A method as claimed in any of claims 13 to 19, 
wherein the item profiles are estimated as those 
parameters which maximise the fit between the function 
which models the data set and the data. 

21. A method as claimed in any of claims 13 to 20, 
wherein the number of components of each item profile is 
set to maximise the effectiveness of the function in 
making predictions. 

22. A method as claimed in claim 21, wherein the number 
of components is set using standard model selection 
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techniques such as the Akaike information criterion. 

23. A method as claimed in claim 11 or 12, wherein the 
data set is modelled as a function of the expected 

5 likelihood of the data in the data set being present and 
the item profiles are chosen as the parameter values 
which maximise the likelihood of the data in the data 
set being present given the function and the assumed 
prior distribution of the case profiles. 

10 

24. A method as claimed in claim 23, wherein the 
function is maximised iteratively and preferably, an EM 
algorithm is used to do this . 

15 25. A method as claimed in any of claims 13 to 24, 

wherein the prior distribution over each component of 
the plurality of possible case profiles is assumed to be 
a standard normal distribution and the components are 
assumed to be independent. 

20 

26. A method as claimed in claim 25, wherein this 
distribution is also used in the Bayesian inference to 
estimate the observation about an item for the 
particular case. 

25 

27. A method as claimed in any of claims 13 to 26, 
wherein a posterior probability distribution over 
possible case profiles for the said particular case is 
generated from the prior probability distribution by 

3 0 Bayesian inference using the set of data relating to the 
said particular case and the function modelling the 
likelihood of the data set being present. 

28. A method as claimed in claim 27, wherein the 

35 posterior probability distribution is used to generate a 
probability distribution over possible observations 
about items for the particular case. 
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29. A method as claimed in any preceding claim, wherein 
each case is a different user of a prediction system 
such that observations by that user about various items 
are included in the dataset. 

5 

30. A method as claimed in claim 29, wherein the 
function is made up of a plurality of models, each model 
representing the suitability of an item for a user. 

10 31. A method as claimed in claim 30, wherein each model 
of the suitability of an item for a user depends 
directly only on the case profile for that user and the 
profile for that item, and not directly on any of the 
data relating to the suitability for the user of any 

15 other item. 



32. A method of filtering data to predict an 
observation about an item for a particular case, in 
which a set of data is obtained representing actual 
observations for a plurality of cases about a plurality 
of items, a function which models the data set as a 
function of a set of case profiles and a set of items 
profiles comprising sets of parameters is set up, 
wherein the case and item profiles each comprise at 
least one hidden metrical variable, the parameters 
defining the characteristics of each said respective 
case and item, the method comprising the steps of: 

a) estimating the values of the case profile 
parameters by solving a hidden variable model of 
the dataset; 

b) using the estimated values of the case profile 
metrical variables in the function to estimate the 
values of the item profile metrical variables; and 



c) predicting an observation about an item for a 
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particular case using the item profile values 
obtained together with a set of data representing 
observations about a plurality of items for the 
said particular case. 

5 

33. A method as claimed in claim 32, wherein the case 
profile values are estimated by solving a hidden 
variable model of the dataset to find approximate values 
of the item profile variables and the approximate item 

10 profile values are then used to estimate the case 
profile values,. 

34. A method as claimed in claim 3 3 , wherein the hidden 
variable model used is a linear model such as for 

15 example a standard linear factor model or principal 
component analysis . 

35. A method as claimed in any of claims 32 to 34, 
wherein the estimated case profile values are 

20 substituted into the function modelling the dataset 

which is then solved using maximum likelihood techniques 
to find the item profile values. 

36. A method as claimed in any of claims 32 to 35, 

25 wherein items in the dataset are considered as belonging 
to a plurality of different groups, each group having a 
different set of case profiles associated with it so 
that the case profile values for each group are 
estimated separately. 

30 

37. A method as claimed in any of claims 32 to 36, 
wherein some items in the dataset are treated directly 
as observed components of the case profile, i.e. as 
values of one or more of the metrical variables. 

35 

38. A method as claimed in any of claims 32 to 37, 
wherein the prediction of an observation about an item 
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for the case is made by updating a prior distribution 
over possible profiles for the case by Bayesian 
inference and then using the updated case profile 
obtained together with the function modelling the 
dataset and the estimated item profile values to make 
predictions . 



10 



15 



20 



39. A method as claimed in any of claims 32 to 37, 
wherein an observation about an item for the case is 
estimated by maximising the likelihood of the data 
relating to the case in question given the function 
modelling the dataset and the estimated item profile 
values to find the values of the case profile, and then 
using the case profile obtained together with a 
likelihood function and the estimated item profiles to 
predict observations about items for that case. 

40. A method as claimed in any preceding claim, wherein 
the method for estimating an observation about an item 
for the case is implemented using a software program 
that manipulates Bayesian networks. 



41. A method as claimed in any preceding claim, wherein 
the item profiles and the prior distribution over 

25 possible case profiles or the actual case profiles are 

calculated in an off-line non real-time filtering engine 
and are supplied to an on-line real-time engine for use 
in the calculation of predicted observations for a case 
when a set of data relating to the said case is supplied 

30 to the real-time engine. 

42 . A method of filtering data to find items which are 
similar to an item specified by a user, in which a set 
of data representing observations about a plurality of 

3 5 items for a plurality of cases is obtained, a function 
which models the data set is used to estimate a 
plurality of item profiles each containing a set of 
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parameters representing characteristics of the item and 
at least one hidden metrical variable, and wherein items 
which are similar to a specified item are found by 
comparing the item profile of the specified item to 
5 other item profiles. 

43. A method of filtering data, in which a set of data 
representing observations about a plurality of items for 
a plurality of cases is obtained, a function which 

10 models the data set is solved so that the data is used 

to estimate a plurality of item profiles each containing 
a set of parameters representing characteristics of the 
item, and at least one hidden metrical variable, and 
wherein cases and/or items are sorted into groups or 

15 clusters such that each group contains cases or items 
having similar case or item profiles. 

44 .-*« A method as claimed in any preceding claim, wherein 
statistical techniques are used to correct for bias in 

2 0 the case data prior to predicting an observation about 

an item for a particular case. 

45. A method as claimed in any preceding claim, further 
comprising the step of obtaining data relating to the 

25 assessment by a plurality of users of one or more 

exogenous standards so as to increase the amount and 
range of data available. 

46. A method of obtaining a data set from which the 

3 0 suitability of a specific object for a user can be 

estimated, in which data relating to the suitability for 
a plurality of users of a plurality of related objects 
is obtained together with data relating to the 
preferences of those users for at least one exogenous 
35 standard which is not directly related to the plurality 
of related objects. 
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47. A method of obtaining a data set from which an 
observation for a case about a specific object can be 
predicted, in which data relating to the observations 
for a plurality of cases about a plurality of predefined 
items is obtained and in which further data relating to 
one or more attributes of one or more of the predefined 
items may also be provided for one or more of the cases. 



48 



A method as claimed in any preceding claim, wherein 
a pre-filtering processing step is provided to carry out 
preliminary screening using objective criteria to reduce 
the number of items that must be assessed in the 
filtering step. 

15 49. A method as claimed in claim 48, wherein weighting 
factors may be applied to the data relating to the 
observations about items for the cases prior to the 
filtering step. 

20 50. A method as claimed in claim 49, wherein the 

weighting factors applied to the data reflect the time 
that has elapsed since the time at which the observation 
about the item was formed such that the weight of each 
piece of data for predictive purposes declines with 

25 time. 

51. A method of weighting data relating to observations 
about an item in which the weight of the data decreases 
with an increase in the time elapsed since the 

30 observation was made. 

52. a method as claimed in any of claims 48 to 51, 
wherein a post filtering processing step is provided in 
addition to or instead of the pre-filtering processing 

35 step. 



53 



A method as claimed in claim 52, wherein the post- 
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filtering processing step is a rules based processing 
step which excludes any items which do not fall within a 
defined set of criteria from the predictions output from 
the filtering step. 

5 

54 . A method as claimed in any preceding claim, wherein 
a different type of output giving an estimated 
prediction such as for example the generic mean of the 
output can be substituted for filtering predictions 

10 where, for whatever reason, there is insufficient 

information concerning either one or more items within 
the item database or concerning one or more cases. 

55. A method as claimed in claim 54, wherein the 
15 estimated predictions are replaced gradually by 

predictions obtained from the filtering method of the 
invention as more data becomes available. 

56 . A method as claimed in claim 53 , wherein a manager 
20 of the dataset generates a fixed number of phantom cases 

such that the profile of an item for which insufficient 
data is available is specified by the manager as being a 
weighted average of some other items and the phantom 
cases are specified to rate that item with ratings which 
25 depend on the manually determined profile. 

57 . A method as claimed in any preceding claim, wherein 
the method is used to provide a data filtering service 
in which a database of observations about a plurality of 

3 0 items for a plurality of users is obtained and analysed 
on an exclusive basis for a single client . 

58. A method as claimed in any of claims 1 to 56, 
wherein the method is used to provide a data filtering 

35 service in which a database of observations about a 

plurality of items for a plurality of cases is obtained 
and analysed to provide a database which may be pooled 
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with other databases, the filtering service operating 
from the pooled databases via linkage preferably through 
a dedicated extranet. Under this arrangement a single 
history database (i.e. a data set representing the 
suitability of a plurality of objects for a plurality of 
users) may be established, developed and maintained for 
the class of clients being served as a whole. 

59. A method as claimed in claim 58, wherein the pooled 
database is configured such that, although the history 
database is held in common as described above, 
contributing websites retain either partial or complete 
exclusivity in relation to the inputs and outputs from 
the database in respect of those particular users that 

15 register through their sites. 

60. A method as claimed in claim 58, wherein database 
information concerning individual users may be held in a 
common pooled database but either partial or complete 

2 0 exclusivity may be maintained by individual clients in 
relation to inputs and outputs in relation to specific 
classes of item. 



61. A method as claimed in any preceding claim, wherein 
an indication of the level of personalisation of the 
predictions provided is given at the user interface. 

62, A method of providing an indication of the level of 
personalisation of recommendations generated by a 
collaborative filtering engine to a user at the user 
interface . 



63. A method as claimed in claim 61 or 62, wherein the 
indication of the level of personalisation is provided 
by a sliding scale representing a personalisation score. 

64. A method as claimed in any of claims 61 to 63, 
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wherein the recommendations are generated by a filtering 
method according to any one of claims 1 to 41 and the 
personalisation score is obtained by determining the 
average variance of the probability distribution over 
5 each characteristic for the case in question. 

65. A method as claimed in any of claims 61 to 64, 
wherein the recommendations provided to the user at the 
user interface are updated each time that the user 

10 enters a further piece of information into the database. 

66. A method as claimed in any of claims 61 to 65 f 
wherein the user interface is a web site and the 
inputting of information is carried out on the same page 

15 on which the personalisation level indicator and the 
recommendations are displayed. 

67. A method as claimed in any preceding claim, wherein 
each item in the data set is plotted against a first 

20 component of the item profile and a second component of 
the item profile on the x and y axes respectively. 

68. A method as claimed in claim 67, wherein if the 
user considers that the position of an item is 

25 incorrect, he can move that item thus imposing a 
different profile on it. 

69. A method of filtering data in which a function is 
set up which models a set of data representing 

30 observations about a plurality of items for a plurality 
of cases, as a function of a plurality of item profiles 
and case profiles each containing a set of unknown 
parameters defining characteristics of the case or item, 
and a best fit of the function to the data is found in 

35 order to find the values of the unknown parameters, the 
unknown parameters for each item are compared to one 
another and, if desired, an operator alters one or more 
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of the unknown parameters for one or more of the items 
before using the sets of unknown parameters to analyse 
the underlying trends in the data. 

» 70. A method as claimed in claim 69, wherein the 

parameters found together with the altered parameters 
are used together with the function to predict an 
observation about one or more items for a particular 
case for which data is not available. 

71. A computer program product for carrying out the 
method as claimed in any preceding claim when run on 
computer processing means. 

15 72. A computer program product containing instructions 
which when run on computer processing means will create 
a computer program for carrying out the method as 
claimed in any preceding claim. 

20 73 . A method of filtering data to find items which are 
suitable for a user, in which a set of data representing 
observations about a plurality of items for a plurality 
of users is obtained, a. function which models the data 
set is used to estimate a plurality of user profiles 

25 each comprising a set of parameters representing 

characteristics of the case, wherein items which were 
preferred by users with similar user profiles to the 
user are recommended to that user. 



30 



74. Data processing means programmed to carry out the 
method as claimed in any preceding claim. 
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